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PROCESS FOR RULE-BASED INSURANCE UNDERWRITING SUITABLE FOR 
USE BY AN AUTOMATED SYSTEM 

CROSS-REFERENCE TO RELATED APPLICATIONS 

The present application claims priority from U.S. Provisional Patent Application 
Serial No. 60/343,239, which was filed on December 31, 2001. 

BACKGROUND OF THE INVENTION 

The present invention relates to a process for underwriting insurance applications, and 
more particularly to a process for underwriting insurance applications based on a 
flexible fuzzy rule logic based system. 

A trained individual or individuals traditionally perform insurance underwriting. A 
given application for insurance (also referred to as an "insurance application") may be 
compared against a plurality of underwriting standards set by an insurance company. 
The insurance application may be classified into one of a plurality of risk categories 
available for a type of insurance coverage requested by an applicant. The risk 
categories then affect a premium paid by the applicant, e.g., the higher the risk 
category, the higher the premium. A decision to accept or reject the application for 
insurance may also be part of this risk classification, as risks above a certain tolerance 
level set by the insurance company may simply be rejected. 

There can be a large amount of variability in the insurance underwriting process when 
performed by individual underwriters. Typically, underwriting standards cannot 
cover all possible cases and variations of an application for insurance. The 
underwriting standards may even be self-contradictory or ambiguous, leading to 
uncertain application of the standards. The subjective judgment of the underwriter 
will almost always play a role in the process. Variation in factors such as underwriter 
training and experience, and a multitude of other effects can cause different 
underwriters to issue different, inconsistent decisions. Sometimes these decisions can 
be in disagreement with the established underwriting standards of the insurance 
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company, while sometimes they can fall into a "gray area" not explicitly covered by 
the underwriting standards. 

Further, there may be an occasion in which an underwriter's decision could still be 
considered correct, even if it disagrees with the written underwriting standards. This 
situation can be caused when the underwriter uses his/her own experience to 
determine whether the underwriting standards may or should be interpreted and/or 
adjusted. Different underwriters may make different determinations about when these 
adjustments are allowed, as they might apply stricter or more liberal interpretations of 
the underwriting standards. Thus, the judgment of experienced underwriters may be 
in conflict with the desire to consistently apply the underwriting standards. 

Most of the key information required for automated insurance underwriting is 
structured and standardized. However, some sources of information may be non- 
standard or not amenable to standardization. By way of example, an attending 
physician statement ("APS") may be almost as unique as each individual physician. 
However, a significant fraction of applications may require the use of one or more 
APS due to the presence of medical impairments, age of applicants, or other factors. 
Without such key information, the application underwriting process cannot be 
• automated for these cases. 

Conventional methods for dealing with some of the problems described above have 
included having human underwriters directly reading the APS. However, an APS 
document can be as long as several tens of pages. Therefore, the manual reading 
process, combined with note-taking and consulting other information, such as an 
underwriting manual or the like, can greatly extend the cycle-time for each 
application processed, increase underwriter variability, and limit capacity by 
preventing the automation of the decision process. 

Other drawbacks may also exist. 
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SUMMARY OF THE INVENTION 

In an exemplary embodiment of the invention, a method for underwriting an 
insurance application comprises the steps of receiving a request to underwrite an 
insurance application, where the request includes information about at least one 
application component and evaluating the at least one application component based 
on at least one rule associated with the at least one application component. The 
method includes assigning a measurement to the at least one application component, 
and assigning the at least one application component to a specific component category 
out of a plurality of component categories based on the assigned measurement. 

In a further exemplary embodiment of the invention, a method for underwriting an 
insurance application comprises the steps of receiving a request to underwrite an 
insurance application, wherein the insurance application includes a plurality of 
application components, evaluating the plurality of application components based on 
a plurality of rules, wherein each of the plurality of rules is associated with each of the 
plurality of application components, assigning a measurement to each of the plurality 
of application components, and assigning each of the plurality of application 
components to a specific component category out of a plurality of component 
categories based on the assigned measurement. 

Another embodiment of the invention provides a method for underwriting an 
insurance application comprising the steps of receiving a request to underwrite an 
insurance application for a customer, wherein the insurance application includes a 
plurality of application components and evaluating the plurality of application 
components based on a plurality of rules, wherein each of the plurality of rules is 
associated with each of the plurality of application components. The method further 
provides for assigning a measurement to each of the plurality of application 
components, and assigning each of the plurality of application components to a 
specific component category out of a plurality of component categories based on the 
assigned measurement, assigning the insurance application to a specific application 
category out of a plurality of application categories, wherein the specific application 
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category is based at least in part on the specific component category, and issuing the 
insurance application to the customer. 

Another exemplary embodiment of the invention provides for a computer program 
stored on a computer readable medium. The computer program causes a computer to 
perform the steps of receiving a request to underwrite an insurance application, 
wherein the insurance application includes at least one application component, 
evaluating the at least one application component based on at least one rule associated 
with the at least one application component, assigning a measurement to the at least 
one application component, and assigning the at least one application component to a 
specific component category out of a plurality of component categories based on the 
assigned measurement. 

In an addition embodiment of the invention, a method for underwriting a transaction 
orientated process comprising, the steps of receiving a request to underwrite a 
transaction, where the request includes information about at least one transaction 
component, evaluating the at least one transaction component based on at least one 
rule associated with the at least one transaction component, wherein the at least one 
rule is a fuzzy logic rule and assigning a measurement to the at least one transaction 
component. Further, the method comprises assigning the at least one transaction 
component to a specific component category out of a plurality of component 
categories based on the assigned measurement, and assigning the transaction to a 
specific transaction category out of a plurality of transaction categories, where the one 
specific transaction category is based at least in part on the specific component 
category. 

According to an exemplary embodiment, a computer program stored on a computer 
readable medium, wherein the computer program causes a computer to act as a system 
comprises a receiver for receiving a request to underwrite an insurance application, 
wherein the insurance application includes at least one application component, an 
evaluation module for evaluating the at least one application component based on at 
least one rule associated with the at least one application component and an 
assignment module for assigning a measurement to the at least one application 
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component, and assigning the at least one application component to a specific 
component category out of a plurality of component categories based on the assigned 
measurement. 

Another exemplary embodiment provides a computer program stored on a computer 
readable medium, wherein the computer program causes a computer to act as a system 
comprising means for receiving a request to underwrite an insurance application, 
wherein the insurance application includes at least one application component, means 
for evaluating the at least one application component based on at least one rule 
associated with the at least one application component, means for assigning a 
measurement to the at least one application component, and means for assigning the at 
least one application component to a specific component category out of a plurality of 
component categories based on the assigned measurement. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 is a graph illustrating a fuzzy (or soft) constraint, a function defining for each 
value of the abscissa the degree of satisfaction for a fuzzy rule, according to an 
embodiment of the invention. 

Figure 2 is a graph illustrating the measurements based on the degree of satisfaction 
for a collection of fuzzy rules, according to an embodiment of the invention. 

Figure 3 is a schematic representation of an object-oriented system to determine the 
degree of satisfaction for a collection of fuzzy rules, according to an embodiment of 
the invention. 

Figure 4 is a flowchart illustrating steps performed in a process for underwriting an 
insurance application using fuzzy logic according to an embodiment of the invention. 

Figure 5 is a flowchart illustrating steps for an inference cycle according to an 
embodiment of the invention. 



5 



WO 03/065268 



PCT/US02/40464 



Figure 6 is a graph illustrating a fuzzy (or soft) constraint, a function defining for each 
value of the abscissa the degree of satisfaction for a rule comparing similar cases, 
according to an embodiment of the invention. 

Figure 7 is a graph illustrating the core of a fuzzy (or soft) constraint, according to an 
embodiment of the invention. 

Figure 8 is a graph illustrating the support of a fuzzy (or soft) constraint, according to 
an embodiment of the invention. 

Figure 9 is a graph illustrating the rate class histogram derived from a set of retrieved 
cases, according to an embodiment of the invention. 

Figure 10 is a chart illustrating the distribution of similarity measures for a set of 
retrieved cases, according to an embodiment of the invention. 

Figure 11 is a table illustrating a linear aggregation of rate classes, according to an 
embodiment of the invention. 

Figure 12 is a flowchart illustrating the steps performed in a process for determining 
the degree of confidence of an underwriting decision based on similar cases, 
according to an embodiment of the invention. 

Figure 13 is a process map illustrating a decision flow, according to an embodiment 
of the invention. 

Figure 14 illustrates a comparison matrix, according to an embodiment of the 
invention. 

Figure 15 illustrates a distribution of classification distances for each bin containing a 
range of retrieved cases, according to an embodiment of the invention. 

Figure 16 illustrates a distribution of normalized percentage of classification distances 
for each bin containing a range of retrieved cases, according to an embodiment of the 
invention. 
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Figure 17 illustrates a distribution of correct classification for each bin containing a 
range of retrieved cases, according to an embodiment of the invention. 

Figure 18 illustrates a distribution of a performance function for each bin containing a 
range of retrieved cases, according to an embodiment of the invention. 

Figures 19 illustrates a distribution of a performance function for each bin containing 
a range of retrieved cases, after removing negative numbers and normalizing the 
values between 0 and 1 , according to an embodiment of the invention. 

Figure 20 illustrates results of a plot of the preference function (derived from Figure 
19) according to an embodiment of the invention. 

Figure 21 illustrates a computation of coverage and accuracy according to an 
embodiment of the invention. 

Figure 22 is a schematic representation of a system for underwriting according to an 
embodiment of the invention. 

Figure 23 a flowchart illustrating the steps performed for executing and manipulating 
a summarization tool according to an embodiment of the invention. 

Figure 24 illustrates a graphic user interface for a summarization tool for a general 
form according to an embodiment of the invention. 

Figure 25 illustrates a graphic user interface for a summarization tool for a condition- 
specific form according to an embodiment of the invention. 

Figure 26 illustrates an optimization process according to an embodiment of the 
invention. 

Figure 27 illustrates an example of an encoded population at a given generation 
according to an embodiment of the invention. 

Figure 28 illustrates a process schematic for an evaluation system according to an 
embodiment of the invention. 

7 



WO 03/065268 



PCT/US02/40464 



Figure 29 illustrates an example of the mechanics of an evolutionary process 
according to an embodiment of the invention. 

Figure 30 is a graph illustrating a linear penalty function used in the evaluation of the 
accuracy of the CBE, according to an embodiment of the invention. 

Figure 31 is a graph illustrating a nonlinear penalty function used in the evaluation of 
the accuracy of the CBE, according to an embodiment of the invention. 

DETAILED DESCRIPTION OF THE INVENTION 

Reference will now be made in detail to the present preferred embodiments of the 
invention, examples of which are illustrated in the accompanying drawings in which 
like reference characters refer to corresponding elements. 

RULES BASED REASONING 

As stated above, a process and system is provided for insurance underwriting which is 
able to incorporate all of the rules in the underwriting standards of a company, while 
being robust, accurate, and reliable. According to an embodiment of the invention, 
the process and system provided may be suitable for automation. Such a process and 
system may be flexible enough to adjust the underwriting standards when appropriate. 
As mentioned above, each individual underwriter may have his/her own set of 
interpretations of underwriting standards about when one or more adjustments should 
occur. According to an embodiment of the present invention, rules may be 
incorporated while still allowing for adjustment using a fiizzy logic-based system. A 
fuzzy logic-based system may be described as a formal system of logic in which the 
traditional binary truth-values "true" and "false" are replaced by real numbers on a 
scale from 0 to 1 . These numbers are absolute values that represent intermediate truth- 
values for answers to questions that do not have simple true or false, or yes or no 
answers. In standard binary logic, a given rule is either satisfied (with a degree of 
satisfaction of 1), or not (with a degree of satisfaction of 0), creating a sharp boundary 
between the two possible degrees of satisfaction. With fuzzy logic, a given rule may 
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be assigned a "partial degree of satisfaction", a number between 1 and 0, in some 

» 

boundary region between a "definite yes", and a "definite no" for the satisfaction of a 
given rule. Each rule will be composed by a conjunction of conditions. Each 
condition will be represented by a fuzzy set A(x), which can be interpreted as a degree 
of preference induced by a value x for satisfying a condition A. An inference engine 
determines a degree of satisfaction of each condition and an overall degree of 
satisfaction of a given rule. 

For the purposes of illustration, imagine that a hypothetical life insurance company 
has a plurality of risk categories, which are identified as "catl", "cat2", "cat3", and 
"cat4." In this example, a rating of catl is a best or low risk, while cat4 is considered 
a worst or high risk. An applicant for an insurance policy would be rejected if he/she 
fails to be placed in any category. An example of a type of rule laid out in a set of 
underwriting guidelines could be, "The applicant may not be in catl if his/her 
cholesterol value is higher than XI." Similarly, a cholesterol value of X2 could be a 
cutoff for cat2, and so on. However, it is possible that a cholesterol reading of one 
point over XI may not in practice disqualify the applicant from the catl rating, if all 
of the other rules are satisfied for catl. It may be that readings of one point over XI 
are still allowable, and so on. To define a fuzzy rule, two parameters, XI a and Xlb 
may be needed. When the applicant's cholesterol is below Xla, a flizzy rule may be 
fully satisfied {e.g., a degree of satisfaction of 1). By way of present example, XI 
from the above may be used as Xla. A parameter Xlb may be a cutoff above which 
the fuzzy rule is fully unsatisfied (e.g., a degree of satisfaction of 0). For example, it 
may be determined from experienced underwriters of the insurance company that 
under no circumstances can the applicant get the catl rating if his/her cholesterol is 
above 190 (XI) by more than four points. In that situation, the fuzzy rule may use 
Xla = XI, that is 190, and Xlb = Xla + 4, that is 194. Other settings may be used. 
Xla and Xlb are parameters of the model. To obtain the partial degree of satisfaction 
when the cholesterol value falls within the range [Xla, Xlb], a continuous switching 
function may be used, which interpolates between the values 1 and 0. The simplest 
such function is a straight line, as disclosed in Fig. 1, but other forms of interpolation 
may also be used. 
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Turning to cat2, cat3, and cat4, there may be a different cholesterol rule for each 
category, which states that the applicant may not be placed in that category if his/her 
cholesterol is higher than X2, X3, or X4, respectively. The same procedures may be 
used, turning each rule into a fuzzy logic rule by assigning high and low cutoff values 
{e.g., X2a, X2b; X3a, X3b; X4a, X4b). Thus, by way of continuing the example, cat2 
may be associated with a fuzzy rule that uses X2a = X2 and X2b = X2 + 4, where X2 
= 195 (for cat2). In addition X3a = X3 and X3b = X3 + 4, where X3 - 200 (for cat3), 
and X4a = X4 and X4b = X4, where X4 = 205 (for cat4). Other parameters also may 
be used. Similarly, one would proceed through each rule in the underwriting 
guidelines, allowing for fuzzy partial degrees of satisfaction. In the present invention, 
each piece of data may be judged many times on the basis of each rule. 

Once each fuzzy rule in the rule set has been applied, a decision is made to which 
category the applicant belongs. For each risk category, there may be a subset of rules 
that apply to that category. In order to judge whether the applicant is eligible for the 
given category, some number of aggregation criteria may be applied. To be concrete, 
using the above hypothetical case, take the subset of all rules that apply to catl . There 
will be a fuzzy degree of satisfaction for every rule, where the set of degrees of 
satisfaction is called {DS-catl }. According to an embodiment of the invention, if any 
of the degrees of satisfaction are zero, then the applicant may be ruled out of catl. 
Thus, one of the aggregation criteria may be, "reject from catl if MIN( {DS-catl} ) 
<= Al," where Al is a chosen constant, and the notation MIN(...) denotes selection 
of the smallest value out of the set. One choice for Al may be 0.5, but other choices 
may be used. By way of another example, the choice, Al =0.7 may also be used. 
Again, the constant Al may be considered as a parameter of the model, which may be 
determined. 

As another aggregation rule, by way of example, if very many of the rules have partial 
degrees of satisfaction of 0.9, then too much adjusting may be occurring, and the 
applicant may be ruled out of catl , even though the aggregation rule, MIN( {DS-catl } 
) <= Al, may not be satisfied. The missing score (MS) is determined from the degree 
of satisfaction (DS) by MS = 1 - DS. If a given fuzzy rule has DS = 0.9, then it would 
have a missing score of 0.1. The aggregation criterion for this case might take the 

10 



WO 03/065268 



PCT7US02/40464 



form, "reject from catl if SUM( {MS-catl} )>=A2," where A2 is a different chosen 
constant, the notation, SUM(...) denotes summation of all the elements of the set, and 
{MS-catl } is the set of "missing scores" for each rule. The aggregation criteria above 
may use the sum of all of the missing scores for the catl rules as a measure to 
determine when too much adjusting has been done, comparing that with the constant 
A2. The measure defined above (SUM {MS - catl}) may be interpreted as a measure 
proportional to the difference between the degree of complete satisfaction of all rules 
and the average degree of satisfaction of each rule (DS - catl). It is understood in this 
invention that there may be any number of different kinds of aggregation criteria, of 
which the above two are only specific examples. 

In a further step, the results of applying the aggregation criteria to the set of rules 
relating to each category may be compared. A result according to one example may 
be that the applicant is ruled out of catl and cat2, but not from cat3 or cat4. In that 
case, assuming that the insurance company's policy was to place applicants in the best 
possible risk category, the final decision would be to place the applicant in cat3. 
Other results may also be obtained. 

As stated above, this fuzzy logic system may have many parameters that may be 
freely chosen. It should be noted that the fuzzy logic system may extend and 
therefore subsume a conventional (Boolean) logic system. By setting the fuzzy logic 
system parameters to have only crisp thresholds (in which the core value is equal to 
the support) the Boolean rules may be represented as a case of fuzzy rules. Those 
parameters may be fit to reproduce a given set of decisions, or set by management in 
order to achieve certain results. By way of one example, a large set of cases may be 
provided by the insurance company as a standard to be reproduced as closely as 
possible. Preferably in such an example, there may be many cases, thereby 
minimizing the error between the fuzzy rules model and the supplied cases. 
Optimization techniques such as logistic regression, genetic algorithms, Monte Carlo, 
etc., also may be used to find an optimal set of parameters. By way of another 
example, some of the fuzzy rules may be determined directly by the management of 
the insurance company. This may be done through knowledge engineering sessions 
with experienced underwriters, by actuaries acting on statistical information related to 
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the risk being insured or by other manners. In fact, when considering maintenance of 
the system, initial parameters may be chosen using optimization versus a set of cases, 
while at a future time, as actuarial knowledge changes, these facts may be used to 
directly adjust the parameters of the fuzzy rules. New fuzzy rules may be added, or 
aggregation rules may change. The fuzzy logic system can be kept current, allowing 
the insurance company to implement changes quickly and with zero variability, 
thereby providing a process and system that is flexible. 

According to one embodiment of the invention, the fuzzy logic parameters may be 
entered into a spreadsheet to evaluate the fuzzy rules for one case at a time. This may 
be essentially equivalent to implementation in a manual processing type environment. 
Fig. 2 is a graphical representation illustrating a plurality of measurements based on a 
degree of satisfaction for a rule! A graphical user interface (GUI) 200 displays the 
degree of satisfaction for one or more rules. GUI 200 includes a standard toolbar 202, 
which may enable a user to manipulate the information in known manners (e.g. 9 
printing, cutting, copying, pasting, etc.). According to an embodiment of the 
invention, GUI may be presented over a network using a browser application such as 
Internet Explorer®, Netscape Navigator®, etc. An address bar 204 may enable the 
user to indicate what portion is displayed. A chart 206 displays various insurance 
decision components and how each insurance decision component satisfies its 
associated rule. A plurality of columns 208 illustrates a plurality of categories for each 
decision component, as well as a plurality of parameters for each decision component. 
A column 210 identifies the actual parameters of the potential applicant for insurance 
and a plurality of columns 212 illustrate a degree of satisfaction of each rule. By way 
of example, a row 214 is labeled BP (Sys), corresponding to a systolic blood pressure 
rule. To receive the Best or Preferred category classification, the applicant must have 
a systolic blood pressure score (score) between 140 and 150. To receive a Select 
category classification, the applicant must have a score between 150 and 155, while a 
score of 155 or more receives a "Standard Plus" or St. Plus category classification. In 
this example, the applicant has a score of 151. The columns 212 show zero 
satisfaction of the rule for the Best and Preferred category classifications. 
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Additionally, Fig. 2 shows that the applicant slightly missed satisfaction for the Select 
category, and Perfect Constraint Satisfaction for the St. Plus Category. 

In another example, a row 216 is labeled BP (Dia.), corresponding to a diastolic blood 
pressure rule. To receive a Best category classification, the applicant must have a 
diastolic blood pressure score (score) between 85 and 90, between 90 and 95 for a 
Preferred category classification, between 90 and 95 for the Select category 
classification, and between 95 and 100 for the St Plus category classification. Here, 
the applicant has a score of 70, resulting in Perfect Constraint Satisfaction in all of the 
columns 212. 

By way of a further example, a row 218 is labeled Nicotine, where a score between 4 
and 5 receives the Best category classification, a score between 2.5 and 3 receives the 
Preferred category classification, a score between 1.5 and 2 receives the Select 
category classification, and a score between 0.7 and 1 receives the St. Plus category 
classification. In this example, the applicant has a score of 4.2. Thus, a score of 
"Mostly Missing" is indicated under the Best category of a column 212, while a score 
of Perfect Constraint Satisfaction is indicated for all others. 

GUI 200 presents a submit button 220 to enable the user to accept a decision and 
submit it to a database. Alternatively, the user may decide not to accept the decision. 
The user may activate a next button 222 to record his/her decision. Other methods for 
display may also be used. 

According to another embodiment of the invention, the rules may be encoded into a 
Java-based computer code, which can query a database to obtain the case parameters, 
and write its decision in the database as well. The object model of the Java 
implementation is illustrated in Fig. 3. This java implementation may be suitable for 
batch processing, or for use in a fully automated underwriting environment. 
According to an embodiment of the invention, a rule engine (class RuleEngine) 302 
may be the control of the system. The decision components of rule engine 302 may 
be composed of several rules (class Rule) 304, several aggregations (class 
Aggregation) 306 and zero or one decision post-processors (class DecisionPost- 
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Processor) 308. A Rule object 304 may represent the fuzzy logic for one or a group 
of variables. Each rule is further composed of a number of rateclasses (class 
Rateclass) 310. A Rateclass object 310 defines the rules for a specific rateclass. 
According to an embodiment of the invention, a Rateclass object 310 may comprise 
two parts. The first is pre-processing (class Preprocessor) 312, which may process 
multiple inputs to form one output. The second is post-processing (class 
Postprocessor) 314, which may take the result of the pre-processing, feed it to a fuzzy 
function and get a fuzzy score. Some of the rules may be conditional, such as the 
variable blood pressure systolic, where the thresholds vary depending on the age of 
the applicant. Class Condition 316 may represent such a condition, if there is any. 
Classes FixedScore 318, Minimal and Maximal may define some special 
preprocessing functions, and class Linear 320 may define the general linear fuzzy 
function as illustrated in Fig. 1 . 

According to an embodiment of the invention, there may be two phases at runtime for 
rule engine 302. ' The first phase may be initialization. In the process, the rule 
definition file in XML format configures the rule engine. All the rule engine 
parameters are defined in the process, for example, number of rules, the fuzzy 
thresholds, pre and post processing and aggregation operation (including class 
Intersection 322 and Sum Missing 324) and class ThresholdLevel 326. The second 
phase may be scoring. After correct initialization, the fireEngine method in rule 
engine 302 may take an input parameter -- an instance of class Case 328 containing 
all the required variable values, and output an instance of class Result 330, which 
encapsulates all the decision results, including rateclass placement, the fuzzy scores 
for each variable and each rateclass, and the aggregation scores. Class ResultLogger 
332 may log the output. Other object models for a java implementation may also be 
used. 

Fig. 4 is a flowchart illustrating the steps performed in a process for underwriting an 
insurance application using fuzzy logic rules according to an embodiment of the 
invention. At step 400, a request to underwrite an insurance application may be 
received. The request to underwrite may come directly from a consumer (e.g., the 
person being insured), an insurance agent or another person. The request to 
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underwrite comprises information about one or more components of the insurance 
application. According to an embodiment of the invention, the components may 
include the various characteristics associated with the individual to be insured, such as 
a cholesterol level, a blood pressure level, a pulse, and other characteristics. 

At step 410, at least one decision component is evaluated. As described above, 
evaluating a decision component may comprise evaluating a decision component 
using a fuzzy logic rule. To perform the evaluation, a rule may be defined and 
assigned to the decision component While each rule is generally only assigned one 
decision component, it is understood that more than one decision component may be 
assigned to each rule. Further, parameters for each rule may be defined, as also 
described above. 

At step 420, at least one measurement is assigned to the at least one decision 
component. As described above with regard to the application of a fuzzy logic rule, a 
measurement may be assigned to the decision component from a sliding scale, such as 
between zero (0) and one (1). Other types of measurements may also be assigned. 

At step 430, each decision component is assigned a specific component category 
based on the assigned measurement. As described above, a number of specific 
component categories are defined. Based on the assigned measurements, each 
decision component is assigned to one or more specific component categories. By 
way of the examples above, the specific component categories may be defined as 
catl, cat2, cat3, and cat4. Catl may only be assigned decision components at a 
certain level or higher. Similarly, cat2 may only be assigned decision components at 
a second level or higher and so on. Other methods for assigning a specific component 
category may also be used. 

At step 440, the insurance application is assigned to a category. According to an 
embodiment of the invention, the categories to which the insurance application is 
assigned are the same as the categories to which the insurance decision components 
are assigned. As described above, the insurance application may be assigned to a 
category based upon how the decision components were assigned. Thus, by way of 
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example, an insurance application may be assigned to catl only if two or fewer 
decision components are assigned to cat2 and all other decision components are 
assigned to catl. Other methods for assigning an insurance application to a category 
may also be used. 

At step 450, an insurance policy is issued. Based on the category to which it is 
assigned, certain amounts are paid to maintain the insurance policy in a manner that is 
well known in the industry. It is understood that based on a category, an insurance 
policy may not be issued. The customers may decide the premiums are too high. 
Alternatively, the insurance company may determine that the risk is too great, and 
decide not to issue the insurance policy. 

CASE BASED REASONING 

A rule-based reasoning (RBR) system may provide for an underwriting process by 
following a generative approach, typically a rule-chaining approach, in which a 
deductive path is created from the evidence (facts) to the decisions (goals). A case- 
based reasoning (CBR) system, on the other hand, may follow an analogical approach 
rather than a deductive approach. In such a system, a reasoner may determine the 
correct rate class suitable for underwriting by noticing a similarity of an application 
for insurance with one or more previously underwritten insurance applications and by 
adapting known solutions of such previously underwritten insurance applications 
instead of developing a solution from scratch. A plurality of underwriting 
descriptions and their solutions are stored in a CBR Case Base and are the basis for 
measurement of the CBR performance. According to an embodiment of the 
invention, a CBR system may be only as good as the cases within its Case Base (also 
referred to as "CB") and its ability to retrieve the most relevant cases in response to a 
new situation. 

A case-based reasoning system can provide an alternative to a rules-based expert 
system, and may be especially appropriate when a number of rules needed to capture 
an expert's knowledge is unmanageable, when a domain theory is too weak or 
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incomplete, or when such domain theory is too dynamic. The CBR system has been 
successful in areas where individual cases or precedents govern the decision-making 
processes. 

In many aspects, a case-based reasoning system and process is a problem solving 
method different from other artificial intelligence approaches. In particular, instead of 
using only general domain dependent heuristic knowledge, such as in the case of an 
expert system, specific knowledge of concrete, previously experienced, problem 
situations may be used with CBR. Another important characteristic may be that CBR 
implies incremental learning, as a new experience is memorized and available for 
future problem solving each time a problem is solved. CBR may involve solving new 
problems by identifying and adapting solutions to similar problems stored in a library 
of past experiences. 

According to an embodiment of the invention, an inference cycle of the CBR process 
may comprise a plurality of steps, as illustrated in the flow chart of Fig. 5. At step 
502, probing and retrieving one or more relevant cases from a case library is 
performed. Ranking the retrieved relevant cases, based on a similarity measure 
occurs at step 504. At step 506, one or more best cases are selected. At step 508, one 
or more retrieved relevant cases are adapted to a current case. The retrieved, relevant 
cases are evaluated versus the current case, based on a confidence factor at step 510. 
The newly solved case is stored in the case memory at step 512. 

These steps will be illustrated below within the context of insurance underwriting. 
However, one of ordinary skill in the art will recognize that these steps may be used 
in other contexts as well. For purposes of this example only, assume that an applicant 
provides his/her vital sign information (e.g., an age, a weight, a height, a systolic 
blood pressure level and a diastolic blood pressure level, a cholesterol level and a 
ratio, etc.) as a vector equal to: 

X = [x u x 2 ...x n ]. 

Furthermore, in this example, assume that two of the values corresponding to the 
cholesterol level, and a weight-to-height ratio, are above normal levels, while the 
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others fall within normal ranges. The first two components of vector X correspond to 
the cholesterol level (xi) and the weight-to-height ratio (x 2 ). For purposes of this 
example, the applicant has an abnormally high cholesterol ratio (8.5%) and is over- 
weight (weight-to-height ratio = 3.8 lb/inch). Furthermore, the applicant has one 
medical condition/history, for instance a history of hypertension. This condition may 
require the applicant to provide additional detailed information related to the history 
of hypertension, e.g., a cardiomegaly, a chest pain, a blood pressure mean and a trend 
over the past three months (where mean is the average of the blood pressure readings 
over a particular time period and trend corresponds to the slope of the reading such as 
upward, or downward, etc.) The detailed information may be contained in a vector Y 
= [yi> Y2, ...» y P ]> where the value oip will vary according to the applicant's medical 
condition. 

The first step in the CBR methodology may be to represent a new case (probe) as a 
query in a structured query language (SQL), which may be formulated against a 
database of previously placed applicants (cases). According to an embodiment of the 
invention, the SQL query may be of the form: 

Q: [fi(x), f 2 (x), ...,f n (x)] AND [Condition=label] 

where [fj(x), f 2 (x), ...,f n (x)], will be a vector of n fuzzy preference functions, one of 
each of the elements of vector X, and a label will be an index representing the 
applicant's current medical condition. For this example, the CBR system may 
retrieve all previous applicants with a history of hypertension, whose vital signs were 
normal, except for a cholesterol ratio and a weight-to-height ratio. In other words, the 
SQL query may be for all cases matching the same condition and similar vital 
information as the applicant. An example of such a SQL query may be: 

Ql =[Support(y!raiwrf (8.5%;x% Support {Around (3.8;x)) % Support (Normal(i)),..., 
Support (Normal(n))] AND [Condition=Hypertension] 

The meaning of Normal(i) may be determined by a fuzzy logic set representing a soft 
threshold for a variable, x(i), as it is used in the stricter class rate, (e.g., Preferred Best 
in the case of Life Insurance.) Fig. 6 illustrates the case of Normal (j), where x(j) 
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corresponds to the cholesterol ratio. For example, it may be determined from the 
most experienced underwriters of the insurance company that under no circumstances 
can the applicant get the best class rate if hisfter cholesterol ratio is above XI by 
more than five points. In that example, one may use Xlb-Xla = 5. The specific 
values for Xla and Xlb may be parameters of the model, and will be explained below 
in greater detail. To obtain the partial degree of satisfaction when the cholesterol ratio 
value falls within the range [Xla, Xlb], a continuous switching function may be used 
which interpolates between the values 1 and 0. The simplest such function is a 
straight line, but other functions may also be used. 

In a linear membership function as shown in Fig. 6, the values Xla and Xlb are the 
low and high cutoffs, respectively. A strict yes/no rule may be recovered in the limit 
that Xla=Xlb. Thus, many methods that mix fuzzy and strict rules in any proportion 
may be covered as a subset of this method. 

Around (a; x) may be determined by a fuzzy relationship, whose membership function 
can be interpreted as the degree to which the value x meets the property of "being 
around a." If Around (a; x) = l, then the value of x may be close to a well within a 
desired tolerance. The support of the fuzzy relationship Around (a; x) may be defined 
as the interval of values of x for which Around {a; x) > 0, as illustrated in Fig. 7. If 
Around (a; x) = 0 then the value of x is too far from a y beyond any acceptable 
tolerance. 

The core of the fuzzy relationship Around (a; x) may be defined as the interval of 
values of x for which Around {a; x) = 1, as illustrated in Fig. 8. Any value belonging 
to the core fully satisfies the property and, in terms of a preference, it is 
indistinguishable from any other value in the core. 

A trapezoidal membership distribution representing the relationship may have a 
natural preference interpretation. The support of the distribution may represent a 
range of tolerable values and correspond to an interval-value used in an initial SQL 
retrieval query. The core may represent the most desirable range of values and may 
establish a top preference. By definition, a feature value falling inside the core will 
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receive a preference value of 1. As the feature value moves away from a most 
desirable range, its associated preference value will decrease from 1 to 0. By 
retrieving the cases having cholesterol ratios falling in the support of Around (8.5%; 
x) and having weight-to-height ratios falling in the support of Around (3.8; x) all 
possible relevant cases may be retrieved. 

In executing an SQL query Ql of the above example against the CBR database, N 
cases may be retrieved. By construction, all N cases must have all of their vital values 
inside the support of the corresponding element x(i) defined by Ql. Furthermore, all 
cases must be related to the same medical condition, (e.g. 9 hypertension). 

At this point, considering the outputs of each of the N retrieved cases may provide a 
first preliminary decision. According to an embodiment of the invention, a decision 
may be made only on the retrieved cases, i.e., only using the first n variables and the 
label used in the SQL query Ql. Each retrieved case may be referred to as a case Q 
(k between 1 and N), and an output classification of case C* as Ok, where Ok is a 
variable having an attribute value indicating the rate class assigned to the applicant 
corresponding to case Ck. By way of example, Ok may assume one out of T possible 
values, i.e., Ok=L, where Lg {Rj, R 2 ,...,Rt}. For instance, in the case of Life 
insurance products, L={Preferred-Best, Preferred, Preferred-Nicotine,..., Standard, 
Table-32}. Other values may also be used. 

In this example, the SQL query Ql retrieves 40 cases (N=40). Fig. 9 illustrates the 
histogram (distribution of the retrieved cases over the rate classes) of the results of the 
SQL query Ql. As seen in Fig. 9, a first preliminary decision indicates Table-II as 
being the most likely rate class for the new applicant represented by the SQL query 
Ql. 

All N cases may have all their vital values inside the support of the corresponding 
element x(i) defined by the SQL query Ql and they are all related to the same medical 
condition, (e.g., hypertension). Therefore, each case may also contain p additional 
elements corresponding to the variables specific to the medical condition. A case Ck 
(k between 1 and N) may be represented as an r-dimensional vector, where r = n + p. 
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The first n elements correspond to the n vital sign described by the vector X, namely 
[Xj,k» X2,k, ...,x n ,k]. The remaining p elements may correspond to the specific features 
related to the condition hypertension, namely [X( n +i),k» Xfr+i)*,..., x r> k]. The value of p 
may vary according to the value of the label, Le. 9 the medical condition. 

A degree of matching between case Ck and the SQL query Ql may be determined. To 
this extent, the n-dimensional vector M(C k , Ql) may be defined as an evaluation of 
each of the functions [fi(x), f 2 (x), ...,f n (x)] from the SQL Query Ql with the first n 
elements of C k , namely [xi jk , x 2 , k , x n , k ]: 

M(C k , Ql) = [f,(x u ), f 2 (x 2 , k ), f n (x n , k )] 

At the end of this evaluation, each case will have a preference vector whose elements 
take values in the (0,1] interval (where the notation (0,1] indicates that this is an open 
interval at 0 (/.&, it does not include the value 0), and a closed interval at 1 {i.e., it 
includes the value 1)). These values may represent a partial degree of membership of 
the feature value in each case and the fuzzy relationships representing preference 
criteria in the SQL query Ql. Since this preference vector represents a partial order, 
the CBR system aggregates its elements to generate a ranking of the case, according 
to their overall preference. 

A determination is made of an n-dimensional weight vector W=[wi, w 2 , w n ] in 
which the element Wj takes a value in the interval [0,1] and determines the relative 
importance of feature i in M(Ck, Ql), i.e., the relevance of fj (xj, k ). According to an 
embodiment of the invention, this can be done via direct elicitation from an 
underwriter or using pair-wise comparisons, following Saaty's method. By way of 
example, if all features are equally important, all their corresponding weights may be 
equal to 1 . Other methods may also be used. Once the weight vector has been 
determined, several aggregating functions are used to rank the cases, where the 
aggregating function will map an n-dimensional unitary hypercube into a one- 
dimensional unit interval, i.e.,: [0,l] n -> [0,1]. 

To consider compensation among the elements, a definition is made of the 
aggregating function A[W,M(Ck, Ql)] as a weighted sum of its elements, i.e. : 
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n 

A[W,M(Ck,Ql)]=Zw^^ 

m 

Alternatively, a strict intersection aggregation without compensation may be obtained 
using a weighted minimum, i.e. : 

A[W,M(C k , Ql)]=Minimum ,,...,„ [max(l-w i ),f(x i , k )] 

Regardless of the aggregating function selected, it may be considered as a measure of 
similarity between the each retrieved case C k and the query Ql, and may be referred 
to as S(k,l). Using this measure, cases may be sorted according to an overall degree 
of preference, which may be interpreted as a measure of similarity between each 
retrieved case C k and the query Ql . 

In the first preliminary decision, the output of case C k may be referred to as O k , where 
O k is a variable whose attribute value indicates a rate class assigned to the applicant 
corresponding to a case C k . Assume, for example, that O k can take one out of T 
possible values, i.e., O k = L, where Le {Ri, R 2j ...,Rt}. For instance, in the case of 
Life insurance products, L={Preferred-Best, Preferred, Preferred-Nicotine, 
Standard, Table-32}. However, not all cases are equally similar to our probe. Fig. 
10 illustrates a distribution of the similarity measure S(k,l) over the T for the 
retrieved N cases (e.g., N = 40 in the present example). 

According to an embodiment of the invention, a minimum similarity value may be 
considered for a case. For instance, to only consider similar cases, a threshold may be 
established on the similarity value. By way of example, only cases with a similarity 
greater or equal to 0.5 may be considered. According to an embodiment of the 
invention, a determination may be made of a fuzzy cardinality of each of the rate 
classes, by adding up the similarity values in each class. Other distributions may also 
be evaluated. 
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A histogram may be drawn that aggregates the original retrieval frequency with the 
similarity of the retrieved cases, and may be referred to as a pseudo-histogram. This 
process may be similar to a N-Nearest Neighbor approach, where the N retrieved 
cases represent the N points in the neighborhood, and the value of S(k,l) represents 
the complement of the distance between the point K and the probe, i.e., the similarity 
between each case and a query. The rate class Ri, with the largest cumulative 
measure may be proposed as a solution. By way of example, Table-II is the solution 
indicated by either option. 

A decision may be made on how many cases will be used to refine a solution. Having 
sorted the cases along the fiTSt n dimensions, the remaining p dimensions may be 
analyzed corresponding to the features related to the specific medical condition. 
Some of these medical conditions may have variables with binary or attribute values 
(e.g., chest pain (Y/N), malignant hypertension (N), Mild, Treated, etc.), while others 
ones may have continuous values (e.g., cardiomegaly (% of enlargement), systolic 
and diastolic blood pressure averaged and trend in past 3 months, 24 months, etc.). 

An attribute-value and a binary-value may be used to select, among the N retrieved 
cases, the cases that have the same values. This may be the same as performing a 
second SQL query, thereby refining the first SQL query Ql. From the originally 
retrieved N cases, the cases with the correct binary or attribute values may be 
selected. This may be done for all of the attribute-values and the binary-valued 
variables, or for a subset of the most important variables. After this selection, the 
original set of cases will likely have been reduced. However, when a Case Base is not 
sufficiently large, a reduction in the number of variables used to perform this selection 
may be needed. Assuming that there are now L cases (where L<N), these cases may 
still be sorted according to a value of a similarity metric S(k,l). 

A third preliminary decision may be obtained by re-computing the distribution of the 
similarity measure S(k,l) over the T values for the output O k , and then proposing as a 
solution the class Ri with the largest cumulative measure using the same pseudo- 
histogram method described above. 
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A similarity measure over the numerical features related to the medical condition may 
be obtained by establishing a fuzzy relationship Around(a; x) similar to the one 
described above. This fuzzy relationship would establish a neighborhood of cases 
with similar condition intensities. By performing an evaluation and an aggregation 
similar to one described above, a similarity measure may be obtained by medical 
condition, and may be referred to as I(£,l)- 

A final decision may involve creating a linear combination of both similarity 
measures: 

F(^l)=oS(^,l)+(l-a)I(^l), 

thereby providing the distribution of the final similarity measure F(fc,l) over the T 
values of O k . According to an embodiment of the invention, the final decision or 
solution may be the class Rj with the largest cumulative measure using the same 
pseudo-histogram method. 

A reliability of the solution may be measured in several ways, and as a function of 
many internal parameters computed during this process. According to an embodiment 
of the invention, the number of retrieved (N) and refined (L) cases (eg., area of the 
histogram) may be measured. Larger values of N + L may imply a higher reliability 
of the solution. According to another embodiment of the invention, the fuzzy 
cardinality of the retrieved and refined cases (i.e., area of the pseudo-histogram) may 
be measured. Larger values may imply a higher reliability of the solution. According 
to a further embodiment of the invention, the shape of the pseudo-histogram of the 
values of O k , (i.e., spread of the histogram) may be measured, where a tighter 
distribution (smaller sigmas) would be more reliable than scattered ones. According 
to another embodiment of the invention, the mode of the pseudo-histogram of the 
values of O k , (e.g., maximum value of the histogram) may be measured. Higher 
values of the mode may be more reliable than lower ones. A contribution of one or 
more of these measurements may be used to determine reliability. Other 
measurements may also be used. 
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Using a training set, a conditional probability of misclassification as a function of 
each of the above parameters may be determined, as well. Then, the (fuzzy) ranges of 
those parameters may be determined and a confidence factor may be computed. 

If the solution does not pass a confidence threshold (e.g., because it does not have 
enough retrieved cases, has a scattered pseudo-histogram, etc), then the CBR system 
may suggest a solution to the individual underwriter and delegate to him/her the final 
decision. Alternatively, if the .confidence factor is above the confidence threshold, 
then the CBR system may validate the underwriter's decision. Regardless of the 
decision maker, once the decision is made, the new case and its corresponding 
solution are stored in the Case Base, becoming available for new queries. 

According to an embodiment of the invention, clean cases (previously placed by rule 
base) may be used to tune the CBR parameters (e.g., membership functions, weights, 
and similarity metrics), thereby abating risk. Other methods for abating risk may also 
be used. 

By defining and using three stages of preliminary decisions, the CBR system may 
display tests, thereby generating useful information for the underwriter while the Case 
Base is still under development. As more information (cases and variables describing 
each case) is stored in the Case Base, the CBR system may be able to use a more 
specific decision stage. 

According to an embodiment of the invention, the first two preliminary decision 
stages may only require the same vital information used for clean applications and the 
symbolic (fce, label) information of the medical condition. A third decision stage 
may make use of a subset of the variables describing the medical condition thereby 
refining the most similar cases. The subset of variables may be chosen by an expert 
underwriter as a function of their relevance to the insured risk (mortality, morbidity, 
etc.). This step will allow the CBR system to refine the set of N retrieved cases, and 
select the most similar L cases, on the basis of the most important binary and attribute 
variables describing the medical condition. The final two preliminary decision stages 



25 



WO 03/065268 



PCTAJS02/40464 



may only require the same vital information used for clean applications and the 
symbolic (i.e. 9 label) information of the medical condition. 

According to an embodiment of the invention, it may be important that at all times the 
value of N (for the first two decision stages) and the value of L (for the third decision 
stage) be large enough to ensure significance. The number of cases used may be one 
of the parameters used to compute the confidence factor described above. 

In the first step of the example, the new case (probe) was represented as a SQL query, 
and it was assumed that only one medical condition was present. The complete SQL 
query Q may have been formulated as: 

Q: [fi(x), f 2 (x), ...,f n (x)] AND [Condition=label] AND [Condition number - 1] 

If the applicant has more than one medical condition, the applicant may be compared 
with other applicants having the same medical conditions. By way of another 
example extending the original example used, the applicant is assumed to have an 
abnormally high cholesterol ratio (8.5%) and be over-weight (weight-to-height ratio = 
3.8 lb/inch). Furthermore, the applicant discloses that he/she has two medical 
conditions, {e.g., hypertension and diabetes). 

In a densely populated Case Base, the applicant may be represented by the query: 
Q: [fi(x), f 2 (x), .:.,f n (x)] AND [Condition l=label] AND [Condition 2 = label 2] 
AND [Condition number = 2] 
This query may be instantiated as: 

Ql:[Support(^7-o«/zrf (8.5%, x)) 9 Support (Around (3.8;x)) 9 Support (NormaI(i)),..., 
Support (Normal(n))] AND [Condition=Hypertension] AND [Condition^Diabetes] 

AND [Condition number = 2] 

With a well-populated Case Base, this may be a process for handling multiple medical 
conditions in complex cases. 
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As more conditions are added to a query, fewer cases will likely be retrieved. If the 
retrieved number of cases N is not significant, a useful decision may not be produced. 
An alternative (surrogate) solution may be to decompose a query into two separate 
queries, treating each medical condition separately. For instance, assuming that the 
modified query Ql requesting two simultaneous conditions does not yield any 
meaningful result, the CBR system may decompose the query Ql into a plurality of 
queries, Ql-A and Ql-B: 

where Ql-A:[Support(Around (8.5%, x)), Support (Around (3.8;x)), Support 
(Normal(i)),..., Support (Normal(n))] AND [Condition=Hypertension] AND 
[Condition number =1]; and 

where Ql-B:[Support(Around (8.5%, x)), Support (Around (3.8;x)), Support 
(Normal(i)),..., Support (Normal(n))] AND [Condition=Diabetes] AND [Condition 
number=l] 

Each query may be treated separately and may obtain a decision on the rate class for 
each of the queries. In other words, it may be assumed that there are two applicants, 
both overweight and with a high cholesterol ratio, one with hypertension and one with 
diabetes. 

After obtaining suggested placements in the appropriate rate class, (e.g., RC-A and 
RC-B, respectively) the answers may be combined according to a set of aggregation 
rules representing the union of multiple rate classes induced by the presence of 
multiple medical conditions. According to an embodiment of the invention, these 
rules may be elicited from experienced underwriters. A look-up table, as illustrated in 
Fig. 11, may represent this rule set. Fig. 11 is just an example that shows a linear 
aggregation of the rate classes. Assume that the rate class assigned to query Ql-A is 
RC-A = Table 6 and the rate-class assigned to query Ql-B is RC-B = Table 8. The 
combined rate class generated from the aggregation rule is RC Table 14. Other 
tables may be designed to over-penalize the occurrence of multiple conditions as their 
presence might affect risk and, therefore, claims, in a non-linear fashion. For example 
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RC-A = Table 6 and RC-B = Table 8 could be aggregated into RC = Table 18 by a 
stricter table. Other aggregation process may also be used. 

Additionally, these tables may be used in an associative fashion. In other words, 
when an applicant has three or more medical conditions, the CBR system may 
aggregate the rate classes derived from the first two medical conditions, obtain the 
result and aggregate the result with the rate class obtained from the third medical 
condition, and so on, as illustrated in Fig. 11. This method is a surrogate alternative 
that may be used when enough cases with multiple conditions are included in the Case 
Base. 

According to an embodiment of the invention, a CBR engine may be encoded into a 
Java based computer code, which can query a database to obtain the case parameters, 
and write its decision in the database as well. This embodiment may be suitable for 
batch processing, and for use in a fully automated underwriting environment. 

CALCULATION OF CONFIDENCE FACTOR 

A described above, CBR may be used to automate decisions in a variety of 
circumstances, such as, but not limited to, business, commercial, and manufacturing 
processes. Specifically, it may provide a method and system to determine at run-time 
a degree of confidence associated with the output of a Case Based Decision Engine, 
also referred to as CBE. Such a confidence measure may enable a determination to be 
made on when a CBE decision is trustworthy enough to automate its execution and 
when the CBE decision is not as reliable and may need further consideration. If a 
CBE decision is not determined to be as reliable, a CBE analysis may still be 
beneficial by providing an indicator, forwarding it to a human decision maker, and 
improving the human decision maker's productivity with an initial screening that may 
limit the complexity of the final decision. The run-time assessment of the confidence 
measure may enable the routing mechanism and increases the usefulness of a CBE. 

An embodiment of the invention may comprise two parts: a) the run-time computation 
of a confidence factor for a query; and b) the determination of the threshold to be used 
with the computed confidence factor. Fig. 12 is a flowchart illustrating a process for 
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determining a run-time computation of a confidence factor according to an 
embodiment of the invention. At Step 1200, a confidence factor process is initiated. 
At Step 1210, CBE internal parameters that may affect the probability of 
misclassification are identified. At Step 1220, the conditional probability of 
misclassification for each of the identified parameters is estimated. At Step 1230, the 
conditional probability of misclassification is translated into a soft constraint for each 
parameter. At Step 1240, a run-time function to evaluate the confidence factor for 
each new query is defined. The determination of the threshold for the confidence 
factor may be obtained by using a gradient-based search. It is understood that other 
steps may be performed within this process, and/or the order of steps may be changed. 
The process of Fig. 12 will now be described in greater detail below. 

According to an embodiment of the invention, CBE may be used to automate the 
underwriting process of insurance policies. By way of example, CBE may be used for 
underwriting life insurance applications, as illustrated below. It is understood, 
however, that the applicability of this invention is much broader, as it may apply to 
any Case-Based Decision Engine(s). 

According to an embodiment of the invention, an advantage of the present invention 
may include improving deployment of a method and system of automated insurance 
underwriting, based on the analysis of previous similar cases, as it may allow for an 
incremental deployment of the CBE, instead of postponing deployment until an entire 
case base has been completely populated. Further, a determination may be made for 
which applications (e.gi, characterized by specific medical conditions) the CBE can 
provide sufficiently high confidence in the output to shift its use from a human 
underwriter productivity tool to an automated placement tool. As a case base (also 
referred to as a "CB") is augmented and/or updated by new resolved applications, the 
quality of the retrieved cases may improve. Another advantage of the present 
invention may be that the quality of the case base may be monitored, thereby 
indicating the portion of the case base that requires growth or scrubbing. For 
instance, monitoring may enable identification of regions in the CB with insufficient 
coverage (small area histograms, low similarity levels), regions containing 
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inconsistent decisions (bimodal histograms), and ambiguous regions (very broad 
histograms). 

In addition, by establishing a confidence threshold, a determination may be made 
whether the output can be used directly to place the application or if it will be a 
suggestion to be revised by the human underwriter, where such a determination may 
be made for each application processed by the CBE. Further, according to an 
embodiment of the invention, a process may be used after the deployment of the CBE, 
as part of maintenance of the case base. As the case base is enriched by the influx of 
new cases, the distribution of its cases may also vary. Regions of the case base that 
were sparsely populated might now contain a larger number of cases. Therefore, as 
part of the tuning of the CBE, one may periodically recompute certain steps within the 
process to update the soft constraints on each of the parameters. As part of the same 
maintenance, one may also periodically update the value of the best threshold to be 
used in the process. 

While the present invention is described in relation to applicability to the 
improvement of the performance of a Case Based Engine for Digital Underwriting, it 
is understood that the method and system described herein may be applied to any 
Case Based Reasoning system, to annotate the quality of its output and decide 
whether or not to act upon the generated output. By way of example, CBR systems 
may have applications in manufacturing, scheduling, design, diagnosis, planning, and 
other areas. 

As described above, the CBE relies on having a densely populated Case Base ("CB") 
from which to retrieve the precedents for the new application (i.e., the similar cases). 
According to an 'embodiment of the invention, until the CB contains a sufficiently 
large number of cases for most possible applications, the CBE output may not be 
reliable. Such an output may, by way of example, be used as a productivity aid for a 
human underwriter, rather than an automation tool. 

For each processed application, a measure of confidence in the CBE output is 
computed so that a final decision maker (CBE or human underwriter) may be 
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identified. As the decision engine generates its output from the retrieval, selection, 
and adaptation of the most similar cases, such a confidence measure may reflect the 
quality of the match between the input (the application under consideration) and the 
current knowledge, e.g., the cases used by the CBE for its decision. 

The confidence measure proposed by this invention needs to reflect the quality of the 
match between the current application under consideration and the cases used for the 
CBE decision. This measure needs to be evaluated within the context of the statistics 
for misclassification gathered from the training set. More specifically, according to 
an embodiment of the invention, the steps described below may be performed. These 
steps may include, but are not limited to, the following: 1) Formulate a query against 
the CB, reflecting the characteristics of the new application as query constraints; 2) 
Retrieve the most relevant cases from the case library. For purposes of illustration, 
assume that N cases have been retrieved, where N is greater than 0 (i.e., not a null 
query or an empty retrieved set of cases). A histogram of the N cases is generated 
over the universe of their responses, i.e., a frequency of the rate class; 3) Rank the 
retrieved cases using a similarity measure; 4) Select the best cases thereby reducing 
the total number of useful retrieved cases from N to L; and 5) Adapt the L refined 
solutions to the current case in order to derive a solution for the case. By way of 
example, selecting the mode of the histogram may be used to derive a solution. 

To determine the confidence in the decision, it may be desirable to understand what 
the probability of generating a correct or incorrect classification is. Specifically, it 
may be desirable to identify which factors affect misclassifications, and, for a given 
case, use these factors to assess if it is more or less likely to generate a wrong 
decision. According to an embodiment of the invention, unless a decision is binary, 
the decision will consist of placing the case under considerations in one of several 
bins. Hence, there may be different degrees of misclassification, depending on the 
distance of the CBE decision from the correct value. Given the different costs 
associated with different degrees of misclassification, the factors impacting the 
decision may be used with the likely degree of misclassification. 
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One aspect of the present invention deals with the process and method used to 
accomplish this result. At Step 1210 the CBE internal parameters that might affect 
the probability of misclassification may be determined. Each of these parameters may 
be referred to as an x. Furthermore, assume that there are M parameters (i.e., 
i=7, ...M, forming a parameter vector X = [xj, X2, ... Xm]. 

Parameters that may affect the probability of misclassification include, but are not 
limited to, the following potential list of candidates: 

X]l n z= Number of retrieved cases (i.e., cardinality of retrieved set and area of 
histogram in Fig. 9), e.g, N = 40 cases. 

x?. variability of retrieved cases (measure of dispersion of histogram in Fig. 9). 

x 3 : number of retrieved cases thresholded by similarity value (area of histogram in 
Fig. 10) e.g., 25 cases. 

x 4 \ variability of retrieved cases thresholded by similarity value, (measure of 
dispersion of histogram in Fig. 10). 

x 5 : L = number of refined cases, (i.e., cardinality of refined set) e.g., 21 cases, 
jt/j: variability of refined cases. 

x 7 : number of refined cases, thresholded by similarity value e.g., 16 cases. 

x$: variability of refined cases thresholded by similarity value. 

xg: measure of strength of mode (percentage of cases in mode of histogram) e.g., 50%. 

According to an embodiment of the invention, other parameters may include: 

x ]0 : number of retrieved cases weighted by similarities, (i.e. fuzzy cardinality of 
retrieved set (area of histogram in Fig. 9)). 
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xjj: variability of retrieved cases weighted by similarities (measure of dispersion of 
histogram in Fig. 9). 

x f2 : number of refined cases weighted by similarities^, e. fuzzy cardinality of refined 
set). 

xj3i variability of refined cases weighted for similarities. 

These parameters may be query-dependent, (eg., they may vary for each new 
application). This may be in contrast to static design parameters, such as, but not 
limited to, similarity weights, retrieval parameters, and confidence threshold. Static 
parameters may be tuned at development time (e.g., when a system is initially 
developed) and periodically revised at maintenance time(s) (e.g., during maintenance 
periods for a system). According to an embodiment of the invention, static parameters 
may be considered fixed while evaluating parameters [xj - xp:]. 

According to an embodiment of the invention, the above parameters may likely be 
positively correlated. By way of example, the number or refined cases L may depend 
on the total number of cases N. The relative impact of these parameters may be 
evaluated via a statistical correlation analysis, CART, C4.5 or other algorithms to 
identify and eliminate those parameters that contribute the least amount of additional 
information. By way of another example, methods may be used to handle partially 
redundant information in a way that avoids double counting of the evidence. The use 
of a "minimum operator in the computation of the Confidence Factor, as is described 
below, is such an example. 

According to an embodiment of the invention, at step 1220, the conditional 
probability of misclassification for each parameter Xj (for i = 1...9) may be estimated. 
By way of example, this step may be achieved by running a set of experiments with a 
training set. Given a certified Case Base (e.g., a CB containing a number K of cases 
whose associated decisions were certified correct), the following steps may then be 
followed: 
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(1) For each of the K cases in the CB, one case is selected (from the CB) and may be 
considered as the probe, i.e., the case whose decision we want to determine (1310). 

(2) The Case Based Engine (CBE) and the (K-l) cases remaining in the CB may then 
be used to determine the rate class (i.e., the placement decision for the probe) (1320). 

(3) The decision derived from the CBE may then be compared with the original 
certified decision of the probe (1330). 

(4) The comparison and its associated parameters [xi — X9] may then be recorded. 

(5) The selected case may be placed in the CB and another case selected, (le., back to 
step(l) (1340)). 

(6) Perform steps (2) through (5) until all the K cases in the CB have been used as 
probes (1350). 

This process is illustrated in Fig. 13. Once the process is completed, the results may 
be collected and analyzed. The comparison matrix of Fig. 14 illustrates a comparison 
between a probe's decision derived from the CBE and the probe's certified reference 
decision. The cells located on the comparison matrix's main diagonal may contain 
the percentage of correct classifications. The cells off the main diagonal may contain 
the percentage of incorrect classifications. As was previously mentioned, there may 
be different degrees of misclassification, depending on the distance of a CBE decision 
from the corresponding reference decision. 

At this point, it may be desirable to estimate the conditional probability of 
misclassification given each of parameters [xi — x 9 ]. Since each case in the 
comparison matrix has its associated parameters [xi — X9] recorded, a histogram of the 
distance from the correct decision for each of these parameters may be generated. 
This process may be illustrated by a simple example. As was previously described, 
the value of the first parameter xi : 

xj: N = Number of retrieved cases. (Le., cardinality of retrieved set (area of histogram 
in Fig. 9)) 
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Fig. 15 shows an example of cross-tabulation of classification distances and number 
of retrieved cases for each probe. By way of this example, the processing of 573 
probes is shown, achieving a correct classification for 242 of them. Additionally, 214 
were classified as one rate class off (where 114 at (-1) and 100 at (+1) equal 214). 
Further, 99 were two rate classes off (where 64 at (-2) and 35 at (+2) equal 99), and 
18 were 3 or more classes off. These 573 cases may also be subdivided in ten bins, 
representing ranges of the number of retrieved cases used for each probe. By way of 
example, 41 cases had between 1 and 4 retrieved cases (first column), while 58 cases 
used more than 40 retrieved cases (last column). Fig. 16 illustrates the same cross- 
tabulation using percentages instead of the number of cases. According to an 
embodiment of the invention, this table may be referred to as matrix D(i, j), where 
i=l ...7 (the seven distances considered), and 7=1. ..10 (the ten bins considered). 

Note that this table contains the same percentages illustrated in Fig. 15, once we 
normalize the values by the total number of cases, tabulated for different values of xi. 
For instance, the total percentage of Correct Classifications (CC) in Fig. 14 may be 
defined as the sum of the elements on the main diagonal, le.\ 

%CC = 



The same percentage may be obtained by adding the percentages distributed along the 
fourth row (corresponding to Distance 0), le.: 



%cc = £0(4,y) 
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The percentage of correct classification may increase with the number of cases 
retrieved for each probe (fourth row, distance = 0). By analyzing a given column on 
this table, an estimate may be derived of the probability of correct/incorrect 
classification, given that the number of cases is in the range of values corresponding 
to the column. 

According to an embodiment of the invention, step 1230 may comprise translating the 
conditional probability of misclassification into a soft constraint for each parameter 
Xj(for 9). By way of example, all misclassifications are determined to be 
equally undesirable, the only concern may be with the row corresponding to distance 
equal 0 (i.e., correct classification), as illustrated in Fig. 17. By way of another 
example, it may be desirable to penalize more those misclassifications that are two or 
three rate classes away from the correct decision. Therefore, an overall performance 
function may be formulated that aggregates the rewards of correct classifications with 
increasing penalties for misclassifications. Although various types of aggregating 
function may be used to achieve these ends, one possible aggregating function may 
use a weighted sum of rewards and penalties. Specifically, for each bin (range of 
values) of the parameter xj under consideration, a reward/penalty Wi may be 
considered. For instance: 



Where, for example, the weight vector Wfw-J, i=l ... 7 is W=[-ll, -6, -1,4, -1, -6. -11] 

This weight vector indicates that misclassifying a decision by three or more rate 
classes is eleven times worse than a misclassification that is one rate class away. 
Except for the fourth element, which indicates the reward for correct classifications, 
all other elements in vector W indicate the penalty value for the corresponding degree 
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of misclassification. Fig. 18 illustrates the result of applying the performance 
function/(Bm^ to the values of Fig. 16, i.e., Matrix D. 

By interpreting the values of Fig. 18 as degree of preference, a fuzzy membership 
function Ci(X() 9 is derived, indicating the tolerable and desirable ranges for each 
parameter Xj. According to an embodiment of the invention, a possible way to convert 
the values of Fig. 18 to a fuzzy membership function is to replace any negative value 
with a zero and then normalize the elements by the largest value. In this example, the 
result of this process is illustrated in Fig. 19 and 20. 

As previously described, the membership function of a fuzzy set is a mapping from 
the universe of discourse (the range of values of the performance function) into the 
interval [0,1]. The membership function has a natural preference interpretation. The 
support of the membership function Ci(xi) represents the range of tolerable (i.e., 
acceptable) values of Xj. The support of the fuzzy set Ci(xf) is defined as the interval 
of values of x for which Cifo)>0. Similarly, the core may represent the most 
desirable range of values and establish a top preference. The core of the membership 
function Ci(Xj) may be defined as the interval of values xj, for which GffcH- In the 
example of Fig. 20, the support is [22, infinity] and the core is [40, infinity]. By 
definition, a feature value falling inside the core will receive a preference value of 1. 
As the feature value moves away from the most desirable range, its associated 
preference value will decrease from 1 to 0. At this point, the information may be 
translated into a soft constraint representing our preference for the values of parameter 
Xj. The soft constraint may be referred to as C/fo), as illustrated in Fig. 20. 

According to an embodiment of the invention, a fourth step of this invention may be 
to define a run-time function to evaluate the confidence measure for each new query. 
By way of example, after executing the third step for each of the nine parameters, 
nine soft constraints may be obtained Ci(Xj) i = 1, ...,9. A soft constraint evaluation 
(SCE). vector is generated that contains the degree to which each parameter satisfies 
its corresponding soft constraint; SCE [C/fa;,..., C 9 (x 9 )]. The Confidence Factor 
(CFj) to be associated to each new case j may be computed at run-time as the 
intersection of all the soft constraints evaluations contained in the SCE vector. 
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CFj = f]C t (x f ) = Min f.,C # (x f ) 

According to an embodiment of the invention, all elements in the Soft Constraint 
Evaluation (SCE) vector may be real numbers in the interval [0,1]. Therefore the 
Confidence Factor CFj will also be a real number in the interval [0,1]. Nine potential 
soft constraints represent the most desirable fuzzy ranges for the nine parameters 
described above. Given a new probe, its computed parameter vector X-[x\ — x$J may 
used be to determine the degree to which all soft constraints are satisfied (SCE), 
leading to the computation of its Confidence Factor CF. 

As previously described above, a four-step process was described to compute at run- 
time the confidence factor. The minimum threshold for the confidence value may be 
determined by a series of experiments with the data, to avoid being too restrictive or 
too inclusive. A higher-than-needed threshold may decrease the coverage provided by 
the CBE by rejecting too many correct solutions (False Negatives). As the threshold 
is lowered, the number of accepted solutions is increased and therefore, an increase in 
coverage is obtained. However, a lower-than needed threshold may decrease the 
accuracy provided by the CBE by accepting too many incorrect solutions (False 
Positives). Therefore, it may be desirable to obtain a threshold using a method that 
balances these two concepts. 

According to an embodiment of the invention, coverage for any given threshold level 
r may include accepting n(r) cases out of K. Given a Case Base with K cases, the 
function gi(t) may be defined as a measure or coverage: 
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For accuracy, the performance function /, as previously defined, may be used (eg., 
aggregate the rewards of correct classifications with the increasing penalties for 
misclassifications) and may be adapted to the entire Case Base to evaluate its 
accuracy for any given threshold r. As the value of r is modified, more decisions may 
be accepted or rejected, modifying the entries of the comparison matrix M=[M(i j)]. 



g 2 (r) = t****^(M) + I £p(i,j)*R*M(iJ) 

Specifically, the function g2(x) may be defined as a measure of relative accuracy, 
where M(i, j) is the (i, j) element of the comparison matrix illustrated in Fig. 14. It 
may represent the percentage of cases classified in cell / while the correct 
classification was cell j. Therefore (i==j) implies a correct classification. The 
percentage may be computed over the total cases for which the decision has been 
accepted (z.e., its corresponding confidence was above the threshold). Further, K*R 
may be a reward for correct classification (where K indicates a static multiple of basic 
reward R), and p(zj)*R may be the penalty for incorrect classification ( p(y) 
determine a dynamic multiple of basic reward R). 

For simplicity, R « 1 may be used. The penalty function p(zj) may indicate the 
increasing penalty for misclassifications farther away from the correct one. Many 
possible versions of function p(z j) can be used. By way of example, the vector W=[- 
11, -6, -7, 4, -1, -6, -11] corresponds to the values: 



K=4 and 



p(ij) = 5|i-j| + 4 
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A linear penalty function p(ij) is illustrated in Fig. 30. It will be recognized by those 
of ordinary skill in the art that other linear functions may also be used. If over- 
penalization for larger misclassifications is desired, a non-linear penalty function may 
be used, such as p(ij) = -3(i-jf + 4„ such as that illustrated in Fig. 31. 

The selection of a penalty function may be left as a choice to a user to represent the 
cost of different misclassifications. According to an embodiment of the invention, if 
there were no differences among such costs, then a simplified version of g2(r) could 
be used to measure the CBE accuracy, e.g.: 



g 2 (T) = ^*J?*M(i,i) 



Functions gi(t) and g2(t) may be defined to measure coverage and relative accuracy, 
respectively. Function gi(t) may be a monotonically non-increasing with the value t 
(larger values of t will not increase coverage) , while g2(t) may be a monotonically 
non-decreasing with the value t (larger values of t will not decrease relative accuracy, 
unless the set is empty.). The two functions may be aggregated into a global accuracy 
function A(t) to evaluate the overall system performance under different thresholds t: 

A(T) = g } (T)Xg 2 (t) 

where X indicates scalar multiplication 

The function A(t) provides a measure of accuracy combined with the coverage of 
cases. Fig. 21 illustrates an example of the computation of Coverage, Relative 
Accuracy, and Global Accuracy as a function of threshold t. In this example, t = 0.1 
has the largest coverage, t = 0.7 has the largest relative accuracy, and t = 0.5 has the 
largest global accuracy. 
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There are many approaches that may be used to maximize the aggregate function Aft) 
to obtain the best value for threshold t. Any reasonable optimization algorithm (such 
as a gradient-based search, or a combined gradient and binary search) may be used to 
this effect. For example, in Fig. 21, the value of A(t) may be computed for nine 
values of t. According to an embodiment of the invention, values may be explored to 
determine a best threshold, By way of example only, the neighborhood of t = 0.5 may 
be explored, such as by a gradient method, to determine that the value t = 0.55 is the 
best threshold. 

As described above, the present invention provides many advantages. According to 
an embodiment of the present invention, incremental deployment of the CBE may be 
achieved, instead of postponing its deployment until an entire Case Base has been 
completely populated. Further, a determination may be made for which applications 
(e.g., characterized by specific medical conditions) the CBE can provide sufficiently 
high confidence in the output to shift its use from a human underwriter productivity 
tool to an automated placement tool. 

According to an embodiment of the invention, as the Case Base is augmented and or 
updated by new resolved applications, the quality of the retrieved cases may change. 
The present invention may enable monitoring of the quality of the Case Base, 
indicating the part of the CB requiring growth or scrubbing. By way of example, 
regions within the Case Base with insufficient coverage (small area histograms, low 
similarity levels) may be identified, as well as regions containing inconsistent 
decisions (bimodal histograms), and ambiguous regions (very broad histograms). 

According to an embodiment of the invention, by establishing a confidence threshold, 
a determination can be made, for each application processed by the CBE, if the output 
can be used directly to place the application or if it will be a suggestion to be revised 
by a human underwriter. 

According to an embodiment of the invention, a process as described above may be 
used after the deployment of the CBE, as part of the Case Base maintenance. As the 
Case Based is enriched by the influx of new cases, the distribution of its cases may 
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also vary. Regions of the CB that were sparsely populated might now contain a larger 
number of cases. Therefore, as part of the tuning of the CBE, one should periodically 
recompute various steps within the process to update the soft constraints on each of 
the parameters. As part of the same maintenance, the value of the best threshold may 
also be updated and used in the process. 

NETWORK-BASED UNDERWRITING SYSTEM 

Fig. 22 illustrates a system 2200 according to an embodiment of the present invention. 
The system 2200 comprises a plurality of computer devices 2205 (or "computers") 
used by a plurality of users to connect to a network 2202 through a plurality of 
connection providers (CPs) 2210. The network 2202 may be any network that 
permits multiple computers to connect and interact. According to an embodiment of 
the invention, the network 2202 may be comprised of a dedicated line to connect the 
plurality of the users, such as the Internet, an intranet, a local area network (LAN), a 
wide area network (WAN), a wireless network, or other type of network. Each of the 
CPs 2210 may be a provider that connects the users to the network 2202. For 
example, the CP 2210 may be an Internet service provider (ISP), a dial-up access 
means, such as a modem, or other manner of connecting to the network 2202. In 
actual practice, there may be significantly more users connected to the system 2200 
than shown in Fig. 22. This would mean that there would be additional users who are 
connected through the same CPs 2210 shown or through another CP 2210. 
Nevertheless, for purposes of illustration, the discussion will presume three computer 
devices 2205 are connected to the network 2202 through two CPs 2210. 

According to an embodiment of the invention, the computer devices 2205a-2205c 
may each make use of any device (eg., a computer, a wireless telephone, a personal 
digital assistant, etc.) capable of accessing the network 2202 through the CP 2210. 
Alternatively, some or all of the computer devices 2205a-2205c may access the 
network 2202 through a direct connection, such as a Tl line, or similar connection. 
Fig. 22 shows the three computer devices 2205a-2205c, each having a connection to 
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the network 2202 through the CP 2210a and the CP 2210b. The computer devices 
2205a-2205c may each make use of a personal computer such as a computer located 
in a user's home, or may use other devices which allow the user to access and interact 
with others on the network 2202. A central controller module 2212 may also have a 
connection to the network 2202 as described above. The central controller module 
2212 may communicate with one or more modules, such as one or more data storage 
modules 2236, one or more evaluation modules 2224, one or more case database 
modules 2240 or other modules discussed in greater detail below. 

Each of the computer devices 2205a-2205c used may contain a processor module 
2204, a display module 2208, and a user interface module 2206. Each of the 
computer devices 2205a-2205c may have at least one user interface module 2206 for 
interacting and controlling the computer. The user interface module 2206 may be 
comprised of one or more of a keyboard, a joystick, a touchpad, a mouse, a scanner or 
any similar device or combination of devices. Each of the computers 2205a-2205c 
may also include a display module 2208, such as a CRT display or other device. 
According to an embodiment of the invention, a developer, a user of a production 
system, and/or a change management module may use a computer device 2205. 

The central controller module 2212 may maintain a connection to the network 2202 
such as through a transmitter module 2214 and a receiver module 2216. The 
transmitter module 2214 and the receiver module 2216 may be comprised of 
conventional devices that enable the central controller module 2212 to interact with 
the network 2202. According to an embodiment of the invention, the transmitter 
module 2214 and the receiver module 2216 may be integral with the central controller 
module 2212. According to another embodiment of the invention, the transmitter 
module 2214 and the receiver module 2216 may be portions of one connection device. 
The connection to the network 2202 by the central controller module 2212 and the 
computer devices 2205 may be a high speed, large bandwidth connection, such as 
through a Tl or a T3 line, a cable connection, a telephone line connection, a DSL 
connection, or another similar type of connection. The central controller module 2212 
functions to permit the computer devices 2205a-2205c to interact with each other in 



43 



WO 03/065268 



PCT/US02/40464 



connection with various applications, messaging services and other services which 
may be provided through the system 2200. 

The central controller module 2212 preferably comprises either a single server 
computer or a plurality of server computers configured to appear to the computer 
devices 2205a-2205c as a single resource. The central controller module 2212 
communicates with a number of modules. Each module will now be described in 
greater detail. 

A processor module 2218 may be responsible for carrying out processing within the 
system 2200. According to an embodiment of the invention, the processor module 
2218 may handle high-level processing, and may comprise a math co-processor or 
other processing devices. 

A decision component category module 2220 and an application category module 
2222 may handle categories for various insurance policies and decision components. 
As described above, each decision component and each application may be assigned a 
category. The decision component category module 2220 may include information 
related to the category assigned for each decision component, including a cross- 
reference to the application associated with each decision component, the assigned 
category or categories, and/or other information. The application category module 
2222 may include information related to the category assigned for each application, 
including a cross-reference to the decision components associated with each 
application, the assigned category or categories, and/or other information. 

An evaluation module 2224 may include an evaluation of a decision component using 
one or more rules, where the rules may be fuzzy logic rules. The evaluation module 
2224 may direct the application of one or more fuzzy logic rules to one or more 
decision components. Further, the evaluation module 2224 may direct the application 
of one or more fuzzy logic rules to one or more policies within a case database 2240, 
to be described in greater detail below. Evaluation module policies within a case 
database 2240, are to be described in greater detail below. 
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A measurement module 2226 may include measurements assigned to one or more 
decision components. As described above, a measurement may be assigned to each 
decision component based on an evaluation, such as an evaluation with a fuzzy logic 
rule. The measurement module 2226 may associate a measurement with each 
decision component, direct the generation of the measurement, and/or include 
information related to a measurement. 

An issue module 2228 may handle issuing an insurance policy based on the 
evaluation and measurements of one or more decision components and the application 
itself. According to an embodiment of the invention, decisions whether to ultimately 
issue an insurance policy or not to issue an insurance policy may be communicated to 
an applicant through the issue module 2228. The issue module 2228 may associate 
issuance of an insurance policy with an applicant, with various measurement(s) and 
evaluation(s) of one or more policies and/or decision components and other 
information. 

A retrieval module 2230 may be responsible for retrieving cases from a case database 
module 2240. According to an embodiment of the invention, queries submitted by a 
user for case-based reasoning may be coordinated through the retrieval module 2230 
for retrieving cases. Other information and functions related for case retrieval may 
also be available. 

A ranking module 2232 may be responsible for ranking cases retrieved based on one 
or more queries received from a user. According to an embodiment of the invention, 
the ranking module 2232 may maintain information related to cases and associated 
with one or more queries. The ranking module 2232 may associate each case with the 
ranking(s) associated with one or more queries. Other information may also be 
associated with the ranking module 2232. 

A rate class module 2234 may handle various designations of rate classes for one or 
more insurance policies. According to an embodiment of the invention, each 
application may be assigned a rate class, where the premiums paid by the applicant 
are based on the rate class. The rate class module 2234 may associate a rate class 
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with each insurance application, and may assign a rate class based on evaluation and 
measurements of various applications and decision components, as well as based on a 
decision by one or more underwriters. Other information may also be associated with 
the rate class module 2234. 

Data may be stored in a data storage module 2236. The data storage module 2236 
stores a plurality of digital files. According to an embodiment of the invention, a 
plurality of data storage modules 2236 may be used and located on one or more data 
storage devices, where the data storage devices are combined or separate from the 
controller module 2212. One or more data storage modules 2236 may also be used to 
archive information. 

An adaptation module 2238 may be responsible for adapting the results of one or 
more queries to determine which previous cases are most similar to the application for 
the present application for insurance. Other information may also be associated with 
the adaptation module 2238. 

All cases used in a case based reasoning may be stored in a case database module 
2240. According to an embodiment of the invention, a plurality of case database 
modules 2240 may be used and located on one or more data storage devices, where 
the data storage devices are combined or separate from the controller module 2212. 

While the system 2200 of Fig. 22 discloses the requester device 2205 connected to the 
network 2202, it should be understood that a personal digital assistant ("PDA"), a 
mobile telephone, a television, or another device that permits access to the network 
2202 may be used to arrive at the system of the present invention. 

According to another embodiment of the invention, a computer-usable and writeable 
medium having a plurality of computer readable program code stored therein may be 
provided for practicing the process of the present invention. The process and system 
of the present invention may be implemented within a variety of operating systems, 
such as a Windows® operating system, various versions of a Unix-based operating 
system (e.g., a Hewlett Packard, a Red Hat, or a Linux version of a Unix-based 
operating system), or various versions of an AS/400-based operating system. For 
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example, the computer-usable and writeable medium may be comprised of a CD 
ROM, a floppy disk, a hard disk, or any other computer-usable medium. One or more 
of the components of the system 2200 may comprise computer readable program code 
in the form of functional instructions stored in the computer-usable medium such that 
when the computer-usable medium is installed on the system 2200, those components 
cause the system 2200 to perform the functions described. The computer readable 
program code for the present invention may also be bundled with other computer 
readable program software. 

According to one embodiment, the central controller module 2212, the transmitter 
module 2214, the receiver module 2216, the processor module 2218, the decision 
component category module 2220, application category module 2222, evaluation 
module 2224, measurement module 2226, issue module 2228, retrieval module 2230, 
ranking module 2232, rate class module 2234, data storage module 2236, adaptation 
module 2238, and case database module 2240 may each comprise computer-readable 
code that, when installed on a computer, performs the functions described above. 
Also, only some of the components may be provided in computer-readable code. 

Additionally, various entities and combinations of entities may employ a computer to 
implement the components performing the above-described functions. According to 
an embodiment of the invention, the computer may be a standard computer 
comprising an input device, an output device, a processor device, and a data storage 
device. According to other embodiments of the invention, various components may 
be computers in different departments within the same corporation or entity. Other 
computer configurations may also be used. According to another embodiment of the 
invention, various components may be separate entities such as corporations or 
limited liability companies. Other embodiments, in compliance with applicable laws 
and regulations, may also be used. 

According to one specific embodiment of the present invention, the system may 
comprise components of a software system. The system may operate on a network 
and may be connected to other systems sharing a common database. Other hardware 
arrangements may also be provided. 
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Other embodiments, uses and advantages of the present invention will be apparent to 
those skilled in the art from consideration of the specification and practice of the 
invention disclosed herein. The specification and examples should be considered 
exemplary only. The intended scope of the invention is only limited by the claims 
appended hereto. 



INFORMATION SUMMARIZATION 

The fuzzy rule-based decision engine and the case-based decision engine may need to 
capture, the medical/actuarial knowledge required to evaluate and underwrite an 
application. They may do so by using a rule set or a case base, respectively. 
However, both decision engines may also need access to all the relevant information 
that characterizes the new application. While the structured component of this 
information can be captured as data and stored into a database ("DB"), the free-form 
nature of an attending physician statement (APS) may not be suitable to automated 
parsing and interpretation. Therefore, for each application requiring an APS, a 
summarization tool may be used that will convert all the essential input variables from 
that statement into a structured form, suitable for storage in a DB and for supporting 
automated decision systems. Furthermore, if the decision engines were not capable of 
handling this new application, then the use of the APS summarization tool may be a 
productivity aid for a human underwriter, rather than an automation tool. 

The present invention may be used in connection with an engine to automate 
decisions in business, commercial, or manufacturing processes. Such an engine may 
be based on (but not limited to) rules and/or cases. A process and system may be 
provided to structure and summarize key information required by a reasoning system. 
According to an embodiment of the invention, summarized information required by a 
reasoning system may be used to underwrite insurance applications, and establish a 
rate class corresponding to the perceived risk of the applicant. Such risk may be 
characterized by several information sources, such as, but not limited to, the 
application form, the APS, laboratory data, medical insurance consortium data bases, 
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motor vehicle registration data bases, etc. Once this information has been gathered 
and compiled, the application risk may be evaluated by a human underwriter or by an 
automated decision system. This evaluation is carried out leveraging the medical and 
actuarial knowledge of the human underwriter, which is captured in its essence by the 
automated reasoning system. According to an embodiment of the invention, an APS 
summarization tool may capture the relevant variables that characterize a given 
medical impairment, allowing an automated reasoning system to determine the degree 
of severity of such impairment and to estimate the underlying insurance risk. 

According to an embodiment of the invention, a focus of this invention on the 
individual medical impairments of a patient may provide 1) incremental deployment 
of the Automated Underwriting system as summaries for new impairments can be 
developed and added; 2) efficient coverage, by addressing the most frequent 
impairments first, according to a Pareto analysis of their frequencies; 3) efficient 
description of the impairment, by including in the summary only the variables that 
could have an impact on the decision. 

By way of example, an aspect of the present invention will be described in terms of 
underwriting of an application for a fixed life insurance policy. Although the 
description focuses on the use of a reasoning system to automate the underwriting 
process of insurance policies, it will be understood by one of ordinary skill in the art 
that the applicability of this invention may be much broader, as it may apply to other 
reasoning system applications. 

According to an embodiment of the invention, a method for executing and 
manipulating an APS summarization tool may occur as illustrated in Fig. 23. At step 
2300, a summarizer with the appropriate medical knowledge would log into a web- 
based system to begin the summarization process. According to an embodiment of 
the invention, the APS summarization system may include a general form plus 
various condition specific forms, which are then filled out by the summarizer. The 
summarizer may first fill out the general form, which contains data fields relevant to 
all applicants. Condition specific forms are then filled out as needed, as the. 
summarizer discovers various features present in the APS being summarized. 
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At step 2302, a summarizer may verify that the APS corresponds to the correct 
applicant. This may be done by matching information on the APS itself with 
information about the applicant provided by the system. By way of example, an 
applicant's name, date of birth, and social security number could be matched. If a 
match is not made, the summarizer may note this by checking the appropriate 
checkbox. According to an embodiment of the invention, at step 2304, failure to 
match an APS to an applicant would end the summarizer' s session for that applicant, 
and the summarizer may recommend corrective action. 

At step 2306, the general form is filled out. Fig. 24 illustrates a general form within a 
graphical user interface 2400 according to an embodiment of the invention. Graphical 
user interface 2400 may comprise access to any network browser, such as Netscape 
Navigator, Microsoft Explorer, or others. Other means of accessing a network may 
also be used. Graphical user interface 2400 may include a control area 2402, whereby 
a summarizer may control various aspects of graphical user interface 2400. Control 
may include moving to various portions of the network via the graphical user 
interface 2400, printing information from the network, searching for information 
within the network, and other functions used within a browser. 

According to an embodiment of the invention, a general form 2400 may provide a 
fixed structure 2406 to capture the data within the system. According to an 
embodiment of the invention, different sections of the form may be organized into 
fields that are structured to provide only a fixed set of choices for the summarizer. 
This may be done to standardize the different pieces of information contained in the 
APS. By way of example, a fixed set of choices may be provided to a summarizer via 
a pull-down menu 2408. For fields that cannot be treated as pull-down menus (eg., 
dates, numeric values of lab tests), such as entry field 2410 labeled as "Initial date," 
validation may be performed to ensure that data entry errors are minimized, and to 
check that values are within allowable pre-determined limits. According to an 
embodiment of the invention, validation may include a "client-side" validation, 
designed to give the summarizer an immediate response if any of the data is 
incorrectly entered. A "client-side" validation may be achieved through JavaScript 
code embedded in the web pages. According to an embodiment of the invention, 
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validation may include a "server-side" validation, which may be performed after data 
submission. "Server-side" validation may be designed primarily as a fail-safe check to 
prevent erroneous data from entering the business-critical database. 

According to an embodiment of the invention, link section 2404 may provide access 
to other portions of general form 2400. As illustrated in Fig. 24, link section 2404 
may include links (such as hypertext links) to portions of general form 2400 that 
relate to blood pressure, family history, nicotine use, build, lipids, alcohol use, 
cardiovascular fitness and tests, final check, comments, abnormal physical symptoms, 
abnormal blood results, abnormal urine results, abnormal pap test, mammogram, 
abnormal colonoscopy, chest x-ray, pulmonary function, substance abuse, and non- 
medical history. Other information within a general form 2400 may also be provided, 
and as such, may be linked through link section 2404. 

According to an embodiment of the invention, an APS summary may distinguish 
between a blank data field and answers such as "don't know" or "not applicable," 
thereby ensuring the completeness of the summary. For a general form submission, a 
final validation pass may be performed at step 2308 to alert the summarizer if certain 
required fields are blank. If required fields are blank, the system may require a 
summarizer to return to step 2306 and complete the general form. If the summarizer 
wishes to indicate that the particular piece of information is not known, they may be 
required to specifically indicate so, thereby maintaining information about what 
information is specifically not known. However, it will be recognized that not all 
fields will necessarily require information. For example, certain fields may be 
"conditionally mandatory," meaning that they require an answer only if other fields 
have been filled out in a particular way. Use of conditionally mandatory fields may 
ensure that all necessary information is gathered. Further, ensuring that all required 
fields have been filled may also ensure that the necessary information is gathered. 

When the general form has been filled out and validated at step 2308, with all of the 
required fields entered, it may be necessary to complete one or more condition- 
specific forms. At step 2310, it is determined if any condition-specific forms are 
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required. If no condition specific forms are required, the results may be submitted to a 
database or other storage device for use at a later time at step 2320. 

If a condition-specific form is required, a summarizer may select a condition-specific 
form to fill-in at step 2312. According to an embodiment of the invention, a 
summarizer may move from the general form to any of the condition-specific forms 
by following a hypertext link embedded within the general form. By way of example, 
a link to a condition-specific form may be similar to, and/or same as links located 
within link portion 2404. Further, links to condition-specific forms may be located 
within link portion 2404. A portion of the knowledge of which condition-specific 
forms are necessary may be obtained while filling out the general form. In the current 
example of life insurance underwriting, these condition-specific forms may include 
hypertension, diabetes, etc. 

Fig. 25 illustrates an example of a condition-specific form for hypertension within a 
graphical user interface 2500 according to an embodiment of the invention. Graphical 
user interface 2500 may comprise access to any network browser, such as Netscape 
Navigator, Microsoft Explorer, or other browser. Other manners of accessing a 
network may also be used. Graphical user interface 2500 may include a control area 
2502, whereby a summarizer may control various aspects of graphic user interface 
2500. Control may include moving to various portions of the network via the graphic 
user interface 2500, printing information from the network, searching for information 
within the network, and other functions used within a browser. 

Graphical user interface 2500 displays the hypertension-specific form, which may 
include various sections for inputting information related to hypertension. In the 
hypertension specific form illustrated in Fig. 25, initial identification section 2504 
may enable a summarizer to provide initial identification information, including 
whether an applicant has hypertension, the type of hypertension, whether it was 
secondary hypertension, and if so, how the cause was removed or cured. According 
to an embodiment of the invention, pull down menus may be used to ensure that 
information entered is standardized for each patient. Other information may also be 
gathered in initial identification section 2504. 
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EKG section 2506 may enable a suramarizer to provide EKG information, including 
EKG readings within a specified time period (e.g., 6 months), chest X-rays within a 
specified time period {e.g., 6 months), and other information related to EKG readings. 
According to an embodiment of the invention, pull down menus may be used to 
ensure that information entered is standardized for each patient. Patient cooperation 
section 2508 may enable a summarizer to provide information related to a patient's 
cooperation, including whether the patient has cooperated, whether a patient's blood 
pressure is under control, and if so, for how many months, and other information 
related to a patient's cooperation in dealing with hypertension. According to an 
embodiment of the invention, pull down menus may be used to ensure that 
information entered is standardized for each patient. 

Blood pressure section 2510 may enable a summarizer to enter blood pressure 
readings corresponding to various dates. According to an embodiment of the 
invention, separate entry fields may be provided for the date the blood pressure 
reading was taken, (e.g., systolic reading (SBP) and the diastolic reading (DBP)). 
Other information may also be entered in blood pressure section 2510. Further, it will 
be understood by those skilled in the art that other information related to hypertension 
may also be entered in a hypertension form displayed on graphical user interface 
2500. 

At step 2314, a summarizer fills out a condition-specific form. For a condition- 
specific form, a final validation pass may be performed at step 2316 to alert the 
summarizer if certain required fields are blank. If required fields are blank, the 
system may require a summarizer to return to step 2314 and complete the condition- 
specific form. As with a general form, if the summarizer wishes to indicate that the 
particular piece of information is not known, they may be required to specifically 
indicate so, thereby facilitating the tracking of what information is specifically not 
known. However, it will be recognized that not all fields will necessarily require 
information. For example, certain fields may be "conditionally mandatory," meaning 
that they require an answer only if other fields have been filled out in a particular 
way. Use of conditionally mandatory fields may ensure that all necessary information 
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is gathered. Further, ensuring that all required fields have been filled may also ensure 
that the necessary information is gathered. 

If the condition-specific form has been filled out and validated at step 2316, with all 
of the required fields entered, a summarizer may determine if additional condition- 
specific forms are necessary at step 2318. If additional condition-specific forms are 
necessary, a summarizer may return to step 2312 and select the appropriate condition- 
specific form in which to enter information. If no additional condition-specific forms 
are required, the results may be submitted to a database or other storage device for use 
at a later time at step 2320. 

Once the summarization is complete for a general form and any selected condition- 
specific forms, the summarizer may submit the results, such as described in step 2320. 
The data may then be transferred over a network, such as the Internet, and stored in a 
database for later use. According to an embodiment of the invention, different 
categorical data fields may be presented to the summarizer as text, but for space 
efficiency are encoded as integer values in the database. A "translation table" to the 
corresponding field meanings may then be provided as part of the design of the APS 
summary. The APS summarizer may provide a structured list of topics, thereby 
enabling a trained person to summarize the most significant information currently 
contained in a handwritten or typewritten APS. Further, the APS summarizer may 
provide an efficient description of the data content of the APS. As stated above, the 
APS itself can be several tens of pages of doctor's notes. The APS summary is 
designed to capture only the data fields that are relevant to the problem at hand. In 
addition, a structured and organized description of the APS data may be provided. An 
APS itself can adhere to any arbitrary order because of different doctor's styles. The 
APS summary may provide a single consistent format for the data as required for an 
automated system, and/or which facilitates the human underwriter's job greatly. 

Since the APS summary may be captured in a database, the information contained in 
it may be easily available to any computer-based application. Again, this is a 
requirement for an automated underwriting system, but it may provide many other 
advantages as well. For example, the APS data may otherwise be very difficult to 
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analyze statistically, to categorize, or to classify. Since the APS summary forms can 
be web-based, the physical location of the summarizes may be immaterial. The 
original APS sheets can be received in location X, scanned, sent over the Internet to 
location Y, where the APS summary is filled out, and the digital data from the 
summary can be submitted and stored on a database server in location Z. Further, the 
automated decision engine can be in any fourth location, as could an individual 
running queries against the APS summary database for statistical analysis or reporting 
purposes. 

According to an embodiment of the invention, general and condition specific forms 
may be written in HTML and JavaScript, which provide the validation functionality. 
A system for storing filled out summary data into a remote database has also been 
created. This system was created using JavaBeans and JSP. Testing by experienced 
underwriters has been performed. The HTML summary forms are displayed to the 
underwriters via a web browser, and the data from an actual APS is entered onto the 
form. The underwriter comments and feedback are captured on the form as well, and 
used to aid the continual improvement of the forms. In choosing which condition- 
specific forms to create, a statistical analysis was done of the frequencies of the 
various medical conditions. The conditions that are most frequent were chosen to be 
worked on first. The APS summary does not have to cover all conditions before it is 
put into production. Deployment of the APS summary may be progressive, covering 
new conditions one by one as new forms become available. Applicants with APS 
requirements that are not covered in the current APS summary may be underwritten 
using the usual procedures. Condition-specific forms may therefore be added to the 
APS summary in order to increase coverage of applicants by the digital underwriting 
system. 

OPTIMIZATION OF FUZZY RULE-BASED AND CASE-BASED DECISION 
ENGINES 

According to an embodiment of the present invention, fuzzy rule-based and case- 
based reasoning may be used to automate decisions in business, commercial, or 
manufacturing process. Specifically, a process and system to automate the 
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determination of optimal design parameters that impact the quality of the output of the 
decision engines is described. 

According to an embodiment of the invention, the optimization aspect may provide a 
structured and robust search and optimization methodology for identifying and tuning 
the decision thresholds (cutoffs) of the fuzzy rules and internal parameters of the 
fuzzy rule-based decision engine ("RBE"), and the internal parameters of the case- 
based decision engine ("CBE"). These benefits may include a minimization of the 
degree of rate class assignment mismatch between that of an expert human 
underwriter and automated rate class decisions. Further, the maintenance of the 
accuracy of rule-based and case-based decision-making as decision guidelines evolve 
with time may be achieved. In addition, identification of ideal parameter 
combinations that govern the automated decision-making process may occur. 

The system and process of the present invention may apply to a class of stochastic 
global search algorithms known as evolutionary algorithms to perform parameter 
identification and tuning. Such algorithms may be executed utilizing principles of 
natural evolution and may be robust adaptive search schemes suitable for searching 
non-linear, discontinuous, and high-dimensional spaces. Moreover, this tuning 
approach may not require an explicit mathematical description of the multi- 
dimensional search space. Instead, this tuning approach may rely solely on an 
objective function that is capable of producing a relative measure of alternative 
solutions-According to an embodiment of the invention, an evolutionary algorithm 
may be used for optimization within an RBE and CBE. By way of example, an 
evolutionary algorithm ("EA") may include genetic algorithms, evolutionary 
programming, evolution strategies, and genetic programming. The principles of these 
related techniques may define a general paradigm that is based on a simulation of 
natural evolution. EAs may perform their search by maintaining at any time t a 
population P(t) = {P\(t), P 2 (t) t P p (t)} of individuals. In this example, "genetic" 
operators that model simplified rules of biological evolution are applied to create the 
new and desirably more superior population P(t+1). Such a process may continue 
until a sufficiently good population is achieved, or some other termination condition 
is satisfied. Each P\(t) € P(t), represents via an internal data structure, a potential 
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solution to the original problem. The choice of an appropriate data structure for 
representing solutions may be more an "art" than a "science" due to the plurality of 
data structures suitable for a given problem. However, the choice of an appropriate 
representation may be a critical step in a successful application of EAs. Effort may be 
required to select a data structure that is compact, minimally superfluous, and can 
avoid creation of infeasible individuals. For instance, if the problem domain requires 
finding an optimal real vector from the space defined by dissimilarly bounded real 
coordinates, it may be more appropriate to choose as a representation a real-set-array 
(e.g., bounded sets of real numbers) instead of a representation capable of generating 
bit strings. A representation that generates bit strings may create many infeasible 
individuals, and can be certainly longer than a more compact sequence of real 
numbers. Closely linked to a choice of representation of solutions may be a choice of 
a fitness function y : P(t)-+R, that assigns credit to candidate solutions. Individuals 
in a population are assigned fitness values according to some evaluation criterion. 
Fitness values may measure how well individuals represent solutions to the problem. 
Highly fit individuals are more likely to create offspring by recombination or 
mutation operations. Weak individuals are less likely to be picked for reproduction, 
so they eventually die out. A mutation operator introduces genetic variations in the 
population by randomly modifying some of the building blocks of individuals. 
Evolutionary algorithms are essentially parallel by design, and at each evolutionary 
step a breadth search of increasingly optimal sub-regions of the options space is 
performed. Evolutionary search is a powerful technique of solving problems, and is 
applicable to a wide variety of practical problems that are nearly intractable with other 
conventional optimization techniques. Practical evolutionary search schemes do not 
guarantee convergence to the global optimum in a predetermined finite time, but they 
are often capable of finding very good and consistent approximate solutions. 
However, they are shown to asymptotically converge under mild conditions. 

An evolutionary algorithm may be used within a process and system for automating 
the tuning and maintenance of fuzzy rule-based and case-based decision systems used 
for automated decisions in insurance underwriting. While this approach is 
demonstrated for insurance underwriting, it is broadly applicable to diverse rule-based 
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and case-based decision-making applications in business, commercial, and 
manufacturing processes. Specifically, we describe a structured and robust search and 
optimization methodology based on a configurable multi-stage evolutionary algorithm 
for identifying and tuning the decision thresholds of the fuzzy rules and internal 
parameters of the fuzzy rule-based decision engine and the internal parameters of the 
case-based decision engine. The parameters of the decision systems impact the 
quality of the decision-making, and are therefore critical. Furthermore, this tuning 
methodology can be used periodically to update and maintain the decision engines. 

As stated above, these fuzzy logic systems may have many parameters that can be 
freely chosen. These parameters may either be fit to reproduce a given set of 
decisions, or set by management in order to achieve certain results, or a combination 
of the two. A large set of cases may be provided by the company as a "certified case 
base." According to an embodiment of the invention, the statistics of the certified 
case base may closely match the statistics of insurance applications received in a 
reasonable time window. According to an embodiment of the invention, there will be 
many more cases than free parameters, so that the system will be over-determined. 
Then, an optimal solution may be found which minimizes the classification error 
between a decision engine's output and the supplied cases. When considering 
maintenance of a system, it may be convenient and advantageous that the parameters 
are chosen using optimization vs. a set of certified cases. New fuzzy rules and 
certified cases may be added, or aggregation rules may change. The fuzzy logic 
systems may be kept current, allowing the insurance company to implement changes 
quickly and with zero variability. 

The parameter identification and tuning problem which may presented in this 
invention can be mathematically described as a minimization problem: 

min^(jc) where J = J 1 X^ 2 X- -X^ #,c9t and W^Z^^ + 

where # is an n-dimensional bounded hyper-volume (parametric search space) in the 
n-dimensional space of reals, x is a parameter vector, and y is the objective function 
that maps the parametric search space to the non-negative real line. 
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Fig. 26 illustrates such a minimization (optimization) problem according to an 
embodiment of the invention in the context of the application domain, where the 
search space x corresponds to the space of decision engine designs induced by the 
parameters imbedded in the decision engine, and the objective function y/ measures 
the corresponding degree of rate-class assignment mismatch between that of the 
expert human underwriter and the decision-engine for the certified case base. An 
evolutionary algorithm iteratively generates trial solutions (trial parameter vectors in 
the space %\ and uses their corresponding consequent degree of rate-class 
assignment mismatch as the search feedback. Thus, at step 2602, a space of decision 
engine's designs is probed. At step 2604, a mismatch matrix, which will be described 
in greater detail below, is generated based on the rate-class decisions generated for the 
cases by the decision engine. Penalties for mismatching cases are assigned at step 
2606. The evolutionary algorithm uses the corresponding degree of rate-class 
assignment mismatches, and the associated penalties to provide feedback to the 
decision engine at step 2608. The system may then refine the internal parameters and 
decision thresholds in the decision engine at step 2602, and proceed through the 
process again. Thus, an iterative process may be performed. 

Fig. 27 illustrates an example of an encoded population maintained by the 
evolutionary algorithm at a given generation. According to an embodiment of the 
invention, each individual in the population is a trial vector of design parameters 
representing fuzzy rule thresholds and internal parameters of the decision engine. 
Each percentage entry may represent a value of a trial parameter that falls within a 
corresponding bounded real line. Each trial solution vector may be used to initialize 
an instance of the decision engine, following which each of the cases in the certified 
case base is evaluated. 

Fig. 28 illustrates a process schematic for an evaluation system according to an 
embodiment of the invention. Trial design parameters are provided at an input 
module 2802. The trial design parameters are automatically input to decision engine 
2804. Case subset 2808 from certified case base 2806 is input into decision engine 
2804. Certified case base 2806 may comprises cases that have been certified as being 
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correct. Case subset 2808 may be a predetermined number of cases from certified 
case base 2806. According to an embodiment of the invention, case subset 2808 may 
comprise two thousand (2000) certified cases. According to an embodiment of the 
invention, case subset 2808 may comprise a number of times the number of tunable 
parameters of decision engine 2804. The cases within case subset 2808 are processed 
in decision engine 2804, and output to decision engine case decisions 2810. 

Once all the cases in the certified case base are evaluated, a square confusion matrix 
2814 is created. According to an embodiment of the invention, confusion matrix 
2814 may be generated by comparing decision engine case decisions 2810 and 
certified case decisions 2812. The rows of confusion matrix 2814 may correspond to 
certified case decisions 2812 as determined by an expert human underwriter, and the 
columns of confusion matrix 2814 may correspond to the decision engine case 
decisions 2810 for the cases in the certified case base. By way of example, assume a 
case has been assigned a category S from certified case decision 2812 (from the 
matrix 2814) and a category PB from decision engine decision 2810. Under these 
categorizations, the case would count towards an entry in the cell at row 3 and column 
1. In this example, the certified case decision 2812 places the case in a higher risk 
category, while the decision engine case decision 2810 places the case in a lower risk 
category. Therefore, for this particular case, the decision engine 2810 has been more 
liberal in decision-making. By way of another example, if on the other hand both the 
certified case decision 2812 and the decision engine case decisions 2810 agree as 
upon categorizing the case in class S, then the case would count towards an entry in 
the cell at row 3 and column 3. By way of another example, if the certified case 
decision 2812 is PB, but the machine decision 2810 is S, then clearly the machine 
decision is more strict. 

According to an embodiment of the invention, it may be desirable to use a decision 
engine that is able to place the maximum number of certified cases along the main 
diagonal of confusion matrix 2814. It may also be desirable to determine those 
parameters 2802 for decision engine 2804 that produce such results (e.g., minimize 
the degree of rate class assignment confusion or mismatch between certified case 
decisions 2812 and decision engine case decisions 2810). Confusion matrix 2814 
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may be used as the foundation to compute an aggregate mismatch penalty or score, 
using penalty module 2816. According to an embodiment of the invention, a penalty 
matrix may be derived from actuarial studies and is element-by-element multiplied 
with the cells of the confusion matrix 2814 to generate an aggregate penalty/score for 
a trial vector of parameters in the evolutionary search. A summation over the number 
of rows and columns of the matrix may occur, and that should now be "T" (upper case 
T), as the confusion matrix M may be of a dimension T x T. Other process systems 
may also be used to achieve the present invention. 

According to an embodiment of the invention, an evolutionary algorithm may utilize 
only the selection and stochastic variation (mutation) operations to evolve generations 
of trial solutions. While the selection operation may seek to exploit known search 
space regions, the mutation operation may seek to explore new regions of the search 
space. Such an algorithm is known to those of ordinary skill in the art. One example 
of the theoretical foundation for such an algorithm class appears in Modeling and 
Convergence Analysis of Distributed (Revolutionary Algorithms, Raj Subbu and 
Arthur C. Sanderson, Proceedings of the IEEE International Congress on 
Evolutionary Computation, 2000. 

Fig. 29 illustrates an example of the mechanics of such an evolutionary process. At 
step 2902, an initial population of trial decision engine parameters is created. 
Proportional selection occurs at step 2904 and an intermediate population is created at 
step 2906. Stochastic variation occurs at step 2908, and a new population is created at 
step 2910. The new population may then be subject to proportional selection at step 
2904, thereby creating an iterative process. 

According to an embodiment of the invention, the evolutionary algorithm may use a 
specified fixed population size and operate in one or more stages, each stage of which 
may be user configurable. A stage is specified by a tuple consisting of a fixed number 
of generations and normalized spread of a Gaussian distribution governing 
randomized sampling. A given solution (also called the parent) in generation i may be 
improved by cloning it to create two identical child solutions from the parent solution. 
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The first child solution may be mutated according to a uniform distribution within the 
allowable search bounds. The second child solution may then be mutated according 
to the Gaussian distribution for generation L If the mutated solution falls outside of 
the allowable search bounds, then the sampling is repeated a few times until an 
acceptable sample is found. If no acceptable sample is found within the allotted 
number of trials, then the second child solution may be mutated according to a 
uniform distribution. The best of the parent and two child solutions is retained and is 
transferred to the population at generation In addition, it is ensured via elitism 
that the improvement in the best performing individual of each generation of 
evolution i+n (where n is an increasing whole number) is a monotone function. 
According to an embodiment of the invention, the process may be repeated until i+n 
generation has been generated, where i+n is a whole number. 

While the invention has been particularly shown and described within the framework 
of an insurance underwriting application, it will be appreciated that variations and 
modifications can be effected by a person of ordinary skill in the art without departing 
from the scope of the invention. For example, one of ordinary skill in the art will 
recognize that the fuzzy rule-based or case-based engine of this invention can be 
applied to any other transaction-oriented process in which underlying risk estimation 
is required to determine the price structure (premium, price, commission, etc.) of an 
offered product, such as insurance, re-insurance, annuities, etc. Furthermore, the 
determination of the confidence factor and the optimization of the decision engines 
transcend the scope of insurance underwriting. A confidence factor obtained in the 
manner described in this document could be determined from any application of a 
case-based reasoner (whether it is fuzzy or not). Similarly, the engine optimization 
process described in this document can be applied to any engine in which the structure 
of the engine has been defined and the parametric values of the engine need to be 
specified to meet a predefined performance metric. Furthermore, one of ordinary skill 
in the art will recognize that such decision engines do not need to be restricted to 
insurance underwriting applications. 
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IN THE CLAIMS: 

LA method for underwriting an insurance application comprising the steps of: 

receiving a request (400) to underwrite an insurance application, where the request 
includes information about at least one application component; 

evaluating the at least one application component (410) based on at least one rule 
associated with the at least one application component; 

assigning a measurement (420) to the at least one application component; and 

assigning the at least one application component (430) to a specific component 
category out of a plurality of component categories based on the assigned 
measurement. 

2. The method according to claim 1, further comprising the step of assigning the 
insurance application to a specific application category (440) out of a plurality of 
application categories, where the one specific application category is based at least in 
part on the specific component category. 

3. The method according to claim 2, wherein the plurality of component categories are 
identical to the plurality of application categories. 

4. The method according to claim 1, wherein the step of evaluating the at least one 
application component includes the sub-steps of: 

defining the at least one rule; 

assigning the at least one rule to the at least one application component; and 

defining a plurality of parameters associated with the at least one rule, wherein each 
parameter corresponds to a particular category of the plurality of component 
categories. 

5. The method according to claim 4, further comprising the step of assigning the 
insurance application to a specific application category (440) out of a plurality of 
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application categories, wherein the specific application category is based at least in 
part on the component category. 

6/The method according to claim 5, wherein the plurality of component categories are 
identical to the plurality of application categories. 

7. The method according to claim 1, wherein the at least one rule is a fuzzy logic rule. 

8. A method for underwriting an insurance application comprising the steps of: 

receiving a request (400) to underwrite an insurance application, wherein the 
insurance application includes a plurality of application components; 

evaluating the plurality of application components (410) based on a plurality of rules, 
wherein each of the plurality of rules is associated with each of the plurality of 
application components; 

assigning a measurement (420) to each of the plurality of application components; and 

assigning each of the plurality of application components (430) to a specific 
component category out of a plurality of component categories based on the assigned 
measurement. 

9. The method according to claim 8, further comprising the step of assigning the 
insurance application to a specific application category (440) out of a plurality of 
application categories, wherein the specific application category is based at least in 
part on the component categories for each of the plurality of component categories. 

10. The method according to claim 9, wherein the plurality of component categories 
are identical to the plurality of application categories. 

11. The method according to claim 8, wherein the step of evaluating the plurality of 
application components includes the sub-steps of: 

defining the plurality of rules; 

assigning each of the plurality of rules to one of the plurality of components; and 
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defining a plurality of parameters associated with each of the plurality of rules, 
wherein each parameter corresponds to a particular category of the plurality of 
component categories. 

12. The method according to claim 11, further comprising the step of assigning the 
insurance application to a specific application category (440) out of a plurality of 
application categories, wherein the specific application category is based at least in 
part on the component categories for each of the plurality of component categories. 

13. The method according to claim 12, wherein the plurality of component categories 
are identical to the plurality of application categories. 

14. The method according to claim 8, wherein the plurality of rules are fuzzy logic 
rules. ' 

1 5. A method for underwriting an insurance application comprising the steps of: 

receiving a request (400) to underwrite an insurance application for a customer, 
wherein the insurance application includes a plurality of application components; 

evaluating the plurality of application components (410) based on a .plurality of rules, 
wherein each of the plurality of rules is associated with each of the plurality of 
application components; 

assigning a measurement (420) to each of the plurality of application components; and 

assigning each of the plurality of application components (430) to a specific 
component category out of a plurality of component categories based on the assigned 
measurement; 

assigning the insurance application to a specific application category out of a plurality 
of application categories, wherein the specific application category is based at least in 
part on the specific component category; and 

issuing the insurance application (450) to the customer. 
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16. The method according to claim 15, wherein the plurality of component categories 
• are identical to the plurality of application categories. 

17. The method according to claim 15, wherein the step of evaluating the at least one 
application component includes the sub-steps of: 

defining the at least one rule; 

assigning the at least one rule to the at least one component; and 

defining a plurality of parameters associated with the at least one rule, wherein each 
parameter corresponds to a particular category of the plurality of component 
categories. 

18. The method according to claim 17, further comprising the step of assigning the 
insurance application to a specific application category out of a plurality of 
application categories, wherein the specific application category is based at least in 
part on the component category. 

19. The method according to claim 18, wherein the plurality of component categories 
are identical to the plurality of application categories. 

20. The method according to claim 15, wherein the plurality of rules are fuzzy logic 
rules. 

21. A computer program stored on a computer readable medium, wherein the 
computer program causes a computer to perform the steps of: 

receiving a request (400) to underwrite an insurance application, wherein the 
insurance application includes at least one application component; 

evaluating the at least one application component (410) based on at least one rule 
associated with the at least one application component; . 

assigning a measurement (420) to the at least one application component; and 
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assigning the at least one application component (430) to a specific component 
category out of a plurality of component categories based on the assigned 
measurement. 

22. The computer readable medium according to claim 21, further comprising the step 
of assigning the insurance application to a specific application category (440) out of a 
plurality of application categories, wherein the specific application category is based 
at least in part on the specific component category. 

23. The computer readable medium according to claim 22, wherein the plurality of 
component categories are identical to the plurality of application categories. 

24. The computer readable medium according to claim 21, wherein the step of 
evaluating the at least one application component includes the sub-steps of: 

defining the at least one rule; 

assigning the at least one rule to the at least one component; and 

defining a plurality of parameters associated with the at least one rule, wherein each 
parameter corresponds to a particular component category of the plurality of 
component categories. 

25The computer readable medium according to claim 24, further comprising the step 
of assigning the insurance application to a specific application category (440) out of a 
plurality of application categories, wherein the specific application category is based 
at least in part on the specific component category. 

26. The computer readable medium according to claim 25, wherein the plurality of 
component categories are identical to the plurality of application categories. 

27. The computer readable medium according to claim 21, wherein the at least one 
rule is a fuzzy logic rule. 

28. A method for underwriting a transaction orientated process comprising the steps 
of: 
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receiving a request (400) to underwrite a transaction, where the request includes 
information about at least one transaction component; 

evaluating the at least one transaction component (410) based on at least one rule 
associated with the at least one transaction component, wherein the at least one rule is 
a fiizzy logic rule; 

assigning a measurement (420) to the at least one transaction component; 

assigning the at least one transaction component to a specific component category 
(430) out of a plurality of component categories based on the assigned measurement; 
and 

assigning the transaction to a specific transaction category (440) out of a plurality of 
transaction categories, where the one specific transaction category is based at least in 
part on the specific component category. 

29. The method according to claim 28, wherein the plurality of component categories 
are identical to the plurality of transaction categories. 

30. The method according to claim 28, wherein the step of evaluating the at least one 
transaction component includes the sub-steps of: 

defining the at least one rule; 

assigning the at least one rule to the at least one transaction component; and 

defining a plurality of parameters associated with the at least one rule, wherein each 
parameter corresponds to a particular category of the plurality of component 
categories. 

31. The method according to claim 30, further comprising the step of assigning the 
transaction to a specific transaction category out of a plurality of transaction 
categories, wherein the specific transaction category is based at least in part on the 
component category. 
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32. The method according to claim 31, wherein the plurality of component categories 
are identical to the plurality of transaction categories. 

33. A computer program stored on a computer readable medium, wherein the 
computer program causes a computer to act as a system comprising: 

a receiver for (2216) receiving a request to underwrite an insurance application, 
wherein the insurance application includes at least one application component; 

an evaluation module (2224) for evaluating the at least one application component 
based on at least one rule associated with the at least one application component; and 

an assignment module for: 

a) assigning a measurement (2226) to the at least one application component; and 

b) assigning the at least one application component (2222) to a specific component 
category out of a plurality of component categories based on the assigned 
measurement. 

34. The computer readable medium according to claim 33, wherein the assignment 
module further assigns the insurance application to a specific application category out 
of a plurality of application categories, wherein the specific application category is 
based at least in part on the specific component category. 

35. The computer readable medium according to claim 34, wherein the plurality of 
component categories are identical to the plurality of application categories. 

36. The computer readable medium according to claim 33, wherein the evaluation of 
the at least one application component includes: 

defining the at least one rule; 

assigning the at least one rule to the at least one component; and 
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defining a plurality of parameters associated with the at least one rule, wherein each 
parameter corresponds to a particular component category of the plurality of 
component categories. 

37. The computer readable medium according to claim 36, wherein the assingment 
module further assigns the insurance application to a specific application category out 
of a plurality of application categories, wherein the specific application category is 
based at least in part on the specific component category. 

38. The computer readable medium according to claim 37, wherein the plurality of 
component categories are identical to the plurality of application categories. 

39. The computer readable medium according to claim 33, wherein the at least one 
rule is a fuzzy logic rule. 

40. A computer program stored on a computer readable medium, wherein the 
computer program causes a computer to act as a system comprising: 

means for receiving a (2216) request to underwrite an insurance application, wherein 
the insurance application includes at least one application component; 

means for evaluating (2224) the at least one application component based on at least 
one rule associated with the at least one application component; 

means for assigning a measurement (2226) to the at least one application component; 
and 

means for assigning (2222) the at least one application component to a specific 
component category out of a plurality of component categories based on the assigned 
measurement. 

41. The computer readable medium according to claim 40, further comprising a means 
for assigning the insurance application to a specific application category out of a 
plurality of application categories, wherein the specific application category is based 
at least in part on the specific component category. 
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42. The computer readable medium according to claim 41, wherein the plurality of 
component categories are identical to the plurality of application categories. 

43. The computer readable medium according to claim 40, wherein the evaluation of 
the at least one application component includes: 

defining the at least one rule; 

assigning the at least one rule to the at least one component; and 

defining a plurality of parameters associated with the at least one rule, wherein each 
parameter corresponds to a particular component category of the plurality of 
component categories. 

44. The computer readable medium according to claim 43, further comprising a means 
for assigning the insurance application to a specific application category out of a 
plurality of application categories, wherein the specific application category is based 
at least in part on the specific component category. 

45 .The computer readable medium according to claim 44, wherein the plurality of 
component categories are identical to the plurality of application categories. 

46.The computer readable medium according to claim 40, wherein the at least one 
rule is a fuzzy logic rule. 
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